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PLENARY 


High-order  Cyclostationary  Signal  Analysis:  An  overview 


Dr.  Georgios  Giannakis,  Univ.  of  Virginia  (USA) 

Processes  encountered  in  statistical  signal  processing,  communications,  and  time  series  analysis 
applications,  are  often  assumed  stationary.  Due  to  the  varying  nature  of  physical  phenomena  and  certain 
manmade  operations  however,  time-invariance  and  the  related  notion  of  stationarity  are  often  violated  in 
practice.  In  this  talk,  I  will  focus  on  cyclostationary  processes  which  are  characterized  by  the  periodicity 
they  exhibit  in  their  second-  and/or  higher-order  statistical  descriptors.  Periodicity  is  omnipresent  in  real  life 
problems  entailing  phenomena  and  operations  of  repetitive  nature:  communications,  geophysical  and 
atmospheric  sciences  (hydrology,  oceanography,  meteorology,  climatology),  rotating  machinery, 
econometrics,  and  biological  systems.  Background  material  will  deal  with  polyspectral  representations, 
sample  estimation  of  cyclic  statistics,  testing  a  time  series  for  second-  and  high-order  cyclostationarity,  and 
noise  suppression  by  joint  exploitation  of  cyclostationarity  and  non-Gaussianity.  The  diversity  offered  by 
periodic  variations  will  be  emphasized  in  the  context  of  blind  identification  of  time-invariant  and 
periodically  varying  systems  and  separation  of  cyclostationary  signals  on  the  basis  of  their  cycles.  Specific 
applications  will  include  time  delay  estimation,  harmonic  retrieval  in  the  presence  of  multiplicative  noise, 
modeling  with  systematically  missing  observations,  polynomial  phase  signals  for  modeling  motion  induced 
variations,  equalization  of  random  channels,  and  generalized  differential  encoding  of  communication 
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Autoregressive  Modeling  of  Lung  Sounds  Using  Higher-Order  Statistics: 
Estimation  of  Source  and  Transmission 

Leontios  J.  Hadjileontiadis  and  Stavros  M.  Panas 

Department  of  Electrical  &  Computer  Engineering,  Division  of  Medical  Signal  and  Image  Processing, 
Aristotle  University  of  Thessaloniki,  GR-540  06  Thessaloniki,  Greece. 

(Tel/Fax:  +3031-9963-03/12,  e-mail:  leontios@ccf.auth.gr,  panas@vergina.eng.auth.gr) 


Abstract 

The  use  of  higher-order  statistics  in  an  autoregressive 
modeling  of  lung  sounds  is  presented,  resulting  in  a  char¬ 
acterization  of  their  source  and  transmission.  The  lung 
sound  source  in  the  airway  is  estimated  using  the  predic¬ 
tion  error  of  an  all-pole  filter  based  on  higher-order  sta¬ 
tistics  (AR-HOS),  while  the  acoustic  transmission  through 
the  lung  parenchyma  and  chest  wall  is  modeled  by  the 
transfer  function  of  the  same  AR-HOS  filter.  The  paramet¬ 
ric  bispectrum,  using  the  estimated  a/  coefficients  of  the 
AR-HOS  model,  is  also  calculated  to  elucidate  the  fre¬ 
quency  characteristics  of  the  modeled  system.  The  imple¬ 
mentation  of  this  approach  on  pre-classified  lung  sound 
segments  in  known  disease  conditions,  selected  from 
teaching  tapes,  was  examined.  Experiments  have  shown 
that  a  reliable  and  consistent  with  current  knowledge  es¬ 
timation  of  lung  sound  characteristics  can  be  achieved 
using  this  method,  even  in  the  presence  of  additive  Gaus¬ 
sian  noise. 


1.  Introduction 

Pulmonary  diagnosis  is  often  based  on  acoustical  pul¬ 
monary  signals  analysis,  since  the  generated  acoustical 
energy,  produced  by  air  flow  during  inspiration  and  expi¬ 
ration,  is  highly  correlated  with  pulmonary  dysfunction. 
This  dysfunction  is  caused  by  some  anatomical  or  physio¬ 
logical  changes  in  the  pulmonary  system  and  it  is  charac¬ 
terized  by  the  changes  in  the  acoustical  properties  of  the 
various  parts  or  organs  involved  [1].  The  description  of  the 
influence  of  a  disease  in  the  production  or/and  transmis¬ 
sion  characteristics  of  lung  sounds  is  of  great  importance, 
since  it  can  give  insight  to  the  causes  of  the  production 
mechanism  of  the  disease. 

In  this  paper,  a  characterization  of  source  and  transmis¬ 
sion  of  lung  sounds  by  means  of  autoregressive  modeling 
based  on  higher-order  statistics  (AR-HOS)  is  presented. 
The  model  performance  is  evaluated  through  pre-classified 
lung  sounds  included  in  teaching  tapes,  recorded  from  pa¬ 


tients  with  various  kinds  of  lung  diseases.  Results  from  the 
applied  AR-HOS  model  proved  the  establishment  of  an 
efficient  and  objective  modeling  approach  of  lung  sounds. 

2.  AR-HOS  modeling  of  lung  sounds 

In  this  section,  the  description  of  the  proposed  AR-HOS 
modeling  of  lung  sounds  is  presented. 


Fig.  1.  Schematic  diagram  of  the  AR-HOS 
modei  employed  for  lung  sounds  analysis. 


According  to  Fig.  1,  lung  sounds,  originated  inside  the 
airways  of  the  lung,  are  modeled  as  the  input  to  an  all-pole 
filter  (AR-HOS),  which  describes  the  transmission  of  lung 
sounds  through  the  parenchyma  and  chest  wall  structure. 
The  output  of  this  filter  is  considered  to  be  the  lung  sounds 
recorded  at  the  chest  wall.  These  sounds  also  contain  heart 
sounds  interference,  the  reduction  of  which  has  been  ad¬ 
dressed  recently  by  the  authors  using  fourth-order  statistics 
in  [2]  and  other  pending  publications.  Muscle  and  skin 
noise  along  with  instrumentation  noise  are  modeled  as  an 
additive  Gaussian  noise. 

Using  the  model  hypothesized  above,  given  a  signal  se¬ 
quence  of  lung  sounds  at  the  chest  wall,  an  autoregressive 
analysis  can  be  applied  to  compute  the  model  parameters, 
and  therefore  the  source  and  transmission  filter  character¬ 
istics  can  separately  be  estimated  [3].  This  becomes  clear 
from  the  following  analysis. 
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2.1  Modeling  of  transmission  of  lung  sounds 


The  AR-HOS  prediction  filter  performs  autoregressive 
prediction  based  on  third-order  statistics  (AR-TOS).  The 
equation  describing  the  autoregressive  model  is: 

p 

/=1 

where,  y„  represents  a  order  AR  process  of  N  samples 
(n=0,,.,yN-l},  Gi  are  the  coefficients  of  the  AR-model,  and 
Wfi  are  i.i.d.,  non-Gaussian,  third-order  stationary,  zero- 
mean,  with  and  y„  independent  of  w/  for  n<L 

Since  is  third-order  stationary,  y„  is  also  third-order 
stationary,  assuming  it  is  a  stable  AR  model.  For  the  model 
of  Eqn.  (1)  we  can  write  the  cumulant-based  'normaV 
equations  [4]: 

p 

-i,T2)=0,t,  =l,...,p  &T2  =-p,...,0,  (2) 

where,  /?(Ti,T2)  is  the  third  order  cumulant  sequence  (TOS) 
of  the  AR  process.  In  practice  we  use  sample  estimates  of 
the  cumulants.  Eqn.  (2)  yields  consistent  estimates  of  the 
AR  parameters  maintaining  the  orthogonality  of  the  pre¬ 
diction  error  sequence  to  an  instrumental  process  derived 
from  the  data  [5].  The  calculation  of  the  order  p  of  the  AR- 
TOS  model  is  reduced  to  a  rank  determination  problem. 
According  to  Giannakis  and  Mendel  [6],  the  rank  of  a  ma¬ 
trix  Ce  formed  by  exact  third-order  cumulants  is  equal  to  p, 
even  when  only  p,  the  upper  bound  of  p,  is  known.  A 
subjective  rule  to  select  the  ‘effective’  AR  order  p  is  to 
find  the  largest  drop  among  two  successive  normalized 

singular  values  of  .  Estimations  of  the  transfer  function 
H{co)  of  the  system  and  the  normalized  parametric  bispec¬ 
trum  €3^  can  be  derived  using  the  following  equations, 
respectively  [7]: 

H{(0)= - (3) 


:Cl{(0,,C0,)  =  m(0,)H{(0,)H\(0,  +CO,),  (4) 


where,  *  denotes  the  complex  conjugate  part. 

The  most  profound  motivations  behind  the  use  of  third- 
order  statistics  in  the  estimation  of  a,  are  [7]:  i)  suppres¬ 
sion  of  Gaussian  noise,  since  third-order  statistics  of  Gaus¬ 
sian  signals  are  identically  zero,  and  ii)  preservation  of  the 
true  phase  character  of  the  signal,  unlike  the  second-order 
statistics  (autocorrelation).  Hence,  when  the  analysis  wave¬ 
form  consists  of  a  non-Gaussian  signal  in  additive  symmet¬ 
ric  noise  (e.g.  Gaussian),  the  parameter  estimation  of  the 
original  signal  using  third-order  statistics  takes  place  in  a 
high  signal-to-noise  ratio  (SNR)  domain  and  the  whole 


parametric  presentation  of  the  process  is  more  accurate  and 
reliable  [4]. 

2.2  Modeling  of  source  of  lung  sounds 

The  model  used  for  the  lung  sounds  originated  inside 
the  airways  of  the  lung  is  depicted  in  Fig.  2.  According  to 
Fig.  2,  the  lung  sound  source  consists  of  three  kinds  of 
noise  sequences  [11].  The  first  sequence  (periodic  im¬ 
pulse)  describes  the  wheezes  and  ronchus  sources,  since 
they  have  characteristic  distinct  pitches  associated  with 
them,  and  they  are  believed  to  be  produced  by  periodic 
oscillations  of  the  air  and  airway  walls  [8]-[ll].  The  sec¬ 
ond  sequence  (random  intermittent  impulse)  describes  the 
crackle  sources  since  they  believed  to  be  produced  by  sud¬ 
den  opening/closing  of  airways  or  bubbling  of  air  through 
extraneous  liquids  in  the  airways,  both  phenomena  associ¬ 
ated  with  sudden  intermittent  bursts  of  sounds  energy  [8]- 
[11],  Finally,  the  third  sequence  (white  non  Gaussian 
noise)  describes  the  breath  sound  sources  since  they  are 
believed  to  be  produced  by  turbulent  flow  in  a  large  range 
of  airway  dimensions  [8]-[l  1].  An  additive  combination  of 
these  sequences  can  produce  almost  all  kinds  of  lung 
sounds,  resulting  in  a  complete  description  of  the  lung 
sound  sources. 


Lung  Sound  Source 

Periodic 

Impulse 

Random  Intermit¬ 
tent  Impulse 

White  non- 
Gaussian  Noise 

Fig.  2.  The  lung  sound  source  model. 

The  estimation  of  the  AR-TOS  model  input  (lung  sound 
source)  can  be  derived  from  the  prediction  error  of  Eqn. 
(1),  by  means  of  inverse  filtering. 

3.  Implementation 

Pre-classified  signals,  each  25s  in  duration,  representing 
typical  diseases  affecting  the  airways,  parenchyma,  and 
chest  wall,  were  selected  fi:om  teaching  tapes  [8]-[10]. 
After  antialiasing  filtering,  the  signals  were  digitized  with  a 
12-Bit  A/D  converter  at  a  sampling  rate  of  2.5KHz-  The 
signals  (sections  of  N=1024,  2048,  4096)  were  subjected 
to  AR-HOS  analysis  and  third-order  statistics  were  esti¬ 
mated  by  averaging  the  cumulants  of  successive  subsec¬ 
tions  (M=5).  The  value  of  order  p  ranged  from  2  Xo  11. 
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4.  Results  and  Discussion 

The  results  obtained  by  means  of  AR-TOS  modeling  of 
lung  sounds  are  presented  in  Table  L 


Table  I.  Results  of  AR-TOS  modeling  of  lung 
sounds. 


Description 
of  segment 

Associated 

Phenomenon 

Estimated 

Source 

Transmis¬ 
sion  Filter 

P 

Bronchial 

(insp.) 

Normal 

Breathing 

White  NGN 

LPF 

(0-550HZ) 

2 

Bronchial 

(insp.) 

Pneumonia 

(early) 

White  NGN 

LPF 

(O-yOOHz) 

4 

Vesicular 

(insp.) 

Normal 

Breathing 

White  NGN 

LPF 

(0-500HZ) 

4 

Broncho- 

vesicular 

(insp.) 

Normal 

Breathing 

White  NGN 

LPF 

(0-450HZ) 

2 

Tracheal 

(insp.) 

Normal 

Breathing 

White  NGN 

LPF 

(0-700HZ) 

6 

Wheeze 

(expir.) 

Asthma 

Periodic 

Impulse 

(T=0.004s) 

LPF 

(0-550HZ) 
d.p.  at  250 
500Hz 

10 

Early  Inspi¬ 
ratory 

Crackles 

(insp.) 

Airway 

Obstruction 

Random 
Intermittent 
Impulses  + 
White  NGN 

LPF 

(0-600HZ) 

4 

Late  Inspi¬ 
ratory 

Crackles 

(insp.) 

Airway 

Restriction 

Random 
Intermittent 
Impulses  -1- 
White  NGN 

LPF 

(0-600HZ) 

11 

Fine 

Crackles 

(insp.) 

Pulmonary 

Fibrosis 

Random 
Intermittent 
Impulses  -H 
White  NGN 

BPF 

(350-700HZ) 
c.f.  530Hz 

2 

Coarse 

Crackles 

(insp.) 

Chronic 

Bronchitis 

Random 
Intermittent 
Impulses  + 
White  NGN 

BPF 

(200-400HZ) 
c.f.  300Hz 

2 

Squawks 

(insp.) 

Allergic  Al¬ 
veolitis  and 
Interstitial 
fibrosis 

Random 
Intermittent 
Impulses  -H 
White  NGN 

BPF 

(180-480HZ) 
c.f.  380Hz 

2 

Stridors 

(insp.) 

Croup 

Periodic 

Impulse 

(T=0.006s) 

LPF  with 
d.p.  at  380 
760Hz 

4 

Pleural  Fric¬ 
tion  Rub 
(insp.) 

Inflamma¬ 
tory  Pleural 

Random 
Intermittent 
Impulses  + 
White  NGN 

LPF 

(0-400HZ) 

2 

Fig.  3  shows  the  results  of  AR-TOS  modeling  of  an  in¬ 
spiratory  segment  of  normal  bronchial  breath  sounds  [Fig. 
3(a)  and  (c)].  The  source  [Fig.  3(b)  and  (d)]  was  found  to 
be  random  white  noise.  The  order  p  of  the  AR-TOS  model 
was  found  equal  to  2  [Fig.  3(e)].  The  estimated  filter  re¬ 
sponse  [Fig.  3(f)]  was  noted  to  be  essentially  the  same  over 
the  breath  segment  [Fig.  3(g)],  indicating  a  primarily  sta¬ 
tionary  filter  response  over  the  inspiratory  phase.  The  typi¬ 
cal  response  [Fig.  3(f)]  was  found  to  have  a  lowpass  char¬ 
acteristic,  with  a  frequency  band  range  around  0-550Hz. 
According  to  bispectrum  [Fig.  3(h)],  the  power  of  low- 
frequencies  was  found  to  be  rather  low,  while  the  central 
frequency  was  located  around  250Hz. 
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NGN:  Non  Gaussian  Noise;  c.f.:  central  frequency;  L/BPF:  Low/Band 
Pass  Filter;  d.p.:  distinct  peaks. 

Representative  results  of  AR-TOS  modeling  of  lung 
sounds  (normal  bronchial  sounds,  wheezes,  fine  crackles 
and  squawks)  are  shown  in  Figs  3,  4,  5,  and  6,  respec¬ 
tively.  The  lack  of  energy  noted  in  the  low-frequency  range 
(O-lOOHz)  is  attributed  to  analog  highpass  filters  used  prior 
to  the  recording  of  these  sounds  to  avoid  heart  sounds, 
muscle,  and  skin  noise  [8]-[10]. 


Fig.  3.  (a)  A  time  section  of  1.63s  of  normal 
bronchial  sounds  (inspiration),  (b)  The  estimated 
iung  sound  source,  (c)  The  estimated  power 
spectrum  of  the  recorded  sound,  (d)  The  esti¬ 
mated  power  spectrum  of  the  estimated  source, 
(e)  The  estimation  of  p.  (f)  The  estimated  filter 
frequency  response,  (g)  The  estimated  filter  fre¬ 
quency  response  vs.  time,  (h)  The  estimated 
magnitude  of  the  parametric  bispectrum  of  re¬ 
corded  sounds. 
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Fig.  4  shows  the  results  of  AR-TOS  modeling  of  an  ex¬ 
piratory  asthmatic  segment  [Fig.  4(a)  and  (c)].  The  esti¬ 
mated  source  [Fig.  4(bl,b2)  and  (d)]  was  found  to  be  a 
periodic  train  of  impulses  (with  period  T«0.004s).  The 
order  p  of  the  AR-TOS  model  was  found  equal  to  10  [Fig. 
4(e)].  The  estimated  filter  response  [Fig.  4(^]  was  noted  to 
be  essentially  the  same  over  the  breath  segment  [Fig.  4(g)], 
indicating  a  primarily  stationary  filter  response  over  the 
expiratory  phase.  The  typical  response  [Fig.  4(f)]  was 
found  to  have  a  lowpass  characteristic,  with  a  frequency 
band  range  around  0-550Hz,  and  two  distinct  resonance 
peaks  at  250Hz  and  500Hz.  According  to  bispectrum  [Fig. 
4(h)],  the  power  of  the  signal  is  located  in  the  resonance 
frequencies,  result  which  is  consistent  with  the  production 
mechanism  of  wheezes,  i.e.  distinct  resonance  in  the 
transmission  path  (airway  walls)  [8]- [10]. 


associated  with  fine  crackles  phenomenon  of  explosive 
reopening  of  small  airways  that  had  closed  during  the  pre¬ 
vious  expiration.  The  order  p  of  the  AR-TOS  model  was 
found  equal  to  2  [Fig.  5(e)].  The  estimated  filter  response 
[Fig.  5(f)]  was  noted  to  be  essentially  the  same  over  the 
breath  segment  [Fig.  5(g)],  indicating  a  primarily  station¬ 
ary  filter  response  over  the  inspiratory  phase.  The  typical 
response  [Fig.  5(f)]  was  found  to  have  a  bandpass  charac¬ 
teristic,  with  a  frequency  band  range  around  350-700Hz, 
with  central  frequency  at  530Hz.  According  to  bispectrum 
[Fig.  5(h)],  the  power  of  the  signal  is  shifted  to  higher  fre¬ 
quencies.  These  results  are  consistent  with  the  under¬ 
standing  of  pulmonary  fibrosis,  since  the  associated  ab¬ 
normal  airway  closure  that  precedes  the  ‘crackling’  re¬ 
opening  is  due  to  increased  lung  stiffness,  which  probably 
causes  the  transmission  of  higher  frequencies  [8]-[10]. 
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Fig.  4.  (a)  A  time  section  of  1.63s  of  asthmatic 
wheezes  (expiration),  (b)-(h)  The  same  as  (b)-(h) 
in  Fig.  2. 


Difference  of  singular  values  Magnitude  (dB) 


Fig.  5.  (a)  A  time  section  of  0.4096s  of  fine 
crackles  (inspiration),  (b)-(h)  The  same  as  (b)-(h) 
in  Fig.  2. 


In  case  of  the  inspiratory  segment  of  fine  crackles  [Fig. 
5(a)  and  (c)],  the  estimated  source  waveform  [Fig.  5(b)  and 
(d)]  contains  impulsive  bursts,  corresponding  to  fine 
crackles,  and  white  NGN.  This  could  be  explained  by  the 


In  case  of  squawks  [Fig.  6(a)  and  (c)],  the  estimated 
source  [Fig.  6(b)  and  (d)]  combines  impulsive  bursts  fol¬ 
lowed  by  short,  almost  exponential,  decaying  periodic  train 
of  impulses.  This  result  is  consistent  with  the  accepted 
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theory  that,  squawks  are  produced  by  the  explosive  open- 
ing  and  decaying  fluttering  of  an  unstable  airway  [8].  The 
order  p  of  the  AR-TOS  model  was  found  equal  to  2  [Fig. 
6(e)].  The  estimated  filter  response  [Fig.  6(f)]  was  noted  to 
be  essentially  the  same  over  the  breath  segment  [Fig.  6(g)], 
indicating  a  primarily  stationary  filter  response  over  the 
inspiratory  phase.  The  typical  response  [Fig.  6(f)]  was 
found  to  have  a  bandpass  characteristic,  with  a  firequency 
band  range  around  180-480Hz,  with  central  frequency  at 
380Hz.  According  to  bispectrum  [Fig.  6(h)],  the  power  of 
the  signal  is  shifted  to  lower  frequencies,  with  smooth 
characteristics. 
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Fig.  6.  (a)  A  time  section  of  0.4098s  squawks 
(inspiration),  (b)-(h)  The  same  as  (b)-(h)  in  Fig.  2. 


The  robustness  of  the  AR-HOS  model  to  symmetric 
noise  has  been  tested  through  the  same  signals,  contami¬ 
nated  with  additive  colored  Gaussian  Noise  (0,  0.03).  Figs. 
7(a)  and  (b)  depict  the  estimated  parametric  bispectrum  of 
AR-HOS  modeling  of  noisy  fine  crackles  and  squawks, 
from  where  it  can  be  noticed  that,  the  main  frequency 
characteristics  remain  unchanged,  compared  to  Figs  5(h) 
and  6(h),  respectively.  From  the  derived  results  it  is  evi¬ 
dent  that  the  AR-HOS  modeling  of  lung  sounds  character¬ 


izes  their  source  and  transmission  efficiently,  providing 
useful  diagnostic  information  to  the  clinicians. 


Fig.  7.  Estimated  parametric  bispectrum  of 
noisy  (a)  Fine  crackies,  (b)  Squawks. 


5.  Summary 

An  autoregressive  analysis  of  lung  sounds  based  on 
higher-order  statistics  (HOS)  was  presented  in  this  work. 
Experiments  proved  the  attribute  of  HOS  in  estimating 
source  and  transmission  characteristics  of  lung  sounds  effi¬ 
ciently,  even  in  the  presence  of  additive  symmetric  noise. 
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Abstract 

This  paper  addresses  the  problem  of  time  delay  es¬ 
timation  (TDE)  in  spatially  correlated  noises.  Two 
cases  are  considered:  (i)  TDE  of  non-Gaussian  signal 
in  spatially  correlated  Gaussian  noises;  and  (ii)  TDE 
of  Gaussian  signal  in  spatially  correlated  non-Gaussian 
noises.  For  the  first  case,  a  new  approach  based  upon 
the  use  of  higher  order  statistics  of  the  measurements 
is  proposed;  for  the  second  case,  a  hybrid  approach  by 
using  higher  and  second  order  statistics  of  the  measure¬ 
ments  is  suggested.  Simulation  examples  are  presented 
to  illustrate  the  effectiveness  of  these  approaches. 


1.  Introduction 

Time  delay  estimation  (TDE)  between  the  received 
signals  at  two  spatially  separated  sensors  has  found 
applications  in  many  areas  such  as  radar,  sonar, 
biomedicine  and  geophysics.  The  traditional  methods 
[1]  are  based  on  shifting  one  measurement  with  respect 
to  the  other  to  compare  the  similarities  between  the 
two  records  and  then  choosing  the  shift  at  which  best 
match  occurs.  In  theory,  these  methods  yield  consis¬ 
tent  time  delay  estimate  when  the  noises  are  uncor¬ 
related  or  their  cross-correlation  functions  are  known; 
however,  they  may  yield  ambiguous  results  when  the 
noise  processes  are  spatially  correlated. 

Recently,  higher  order  statistics  have  found  wide  ap¬ 
plications  in  signal  processing  field.  In  those  practical 
settings  where  the  noise  processes  are  jointly  Gaussian, 
possibly  correlated  and  the  signal  is  non-Gaussian, 
several  TDE  approaches  have  been  proposed  based 
on  higher  order  cumulants  of  the  measurements  [2,3]. 
Among  these  approaches,  the  time  domain  approach 
of  [2]  may  fail  to  work  when  the  cumulant  matrix  con¬ 
structed  is  ill  conditioned.  On  the  other  hand,  although 
Tugnait’s  approaches  [3]  do  not  suffer  from  this  limit a¬ 
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tion,  they  are  applicable  only  to  fourth  order  cumulant 
case.  Motivated  by  the  above  observations,  the  first 
objective  of  this  paper  is  to  study  the  problem  of  mul¬ 
tipath  TDE  of  non-Gaussian  signals  in  spatially  corre¬ 
lated  Gaussian  noises,  referred  to  as  NSGN  case,  and 
to  propose  new  normal  equations  which  yield  unique 
solution  and  are  applicable  to  any  higher  than  second 
order  case; 

On  the  other  hand,  colored  non-Gaussian  noise  envi¬ 
ronments  are  known  to  exist  widely  in  practical  ap¬ 
plications.  For  example,  Hinich  et  al.  [4]  have  shown 
that  the  ship  radiated  noise  is  non-Gaussian  with  non¬ 
zero  bispectrum.  Some  types  of  underwater  noises  are 
definitely  non-Gaussian  [5].  Moreover,  the  Gaussianity 
of  signal  has  been  assumed  and  tested  in  conventional 
TDE  methods  [1].  Therefore,  another  obvious  problem 
needed  to  be  solved  is  that  of  estimating  multipath 
time  delays  of  Gaussian  signal  in  unknown  spatially 
correlated  non-Gaussian  noises,  referred  to  as  GSNN 
case,  which  is  the  second  objective  of  this  paper.  We 
propose  a  hybrid  approach  for  solving  this  problem  by 
using  second  and  higher  order  statistics  of  the  measure¬ 
ments. 

2.  TDE  in  NSGN  Case 

In  this  section,  we  study  the  problem  of  TDE  of  non- 
Gaussian  signal  in  spatially  correlated  Gaussian  noises. 

2.1.  Model  Assumptions 

Let  a:(n)  and  y{n)  denote  two  sensor  measurements 
satisfying 

x{n)  =  s{n)  -h  wi{n)  (1) 

M 

2/(^)  =  ~  +  W2{n)  (2) 

j=i 


where  s{n)  is  an  unknown  signal,  M  is  the  multipath 
number,  Aj  and  s{n  —  Dj)  are  the  amplitude  attenua¬ 
tion  and  delayed  (or  advanced)  version  of  the  jth  path 
reflection,  respectively. 

ASal)  The  signal  s(n)  is  a  zero  mean,  non-Gaussian 
linear  random  process,  given  by 

5(n)  =  ^  h{k)e{n  -  k)  (3) 

k 

where  ES-ool^«i  <  oo,  H{0)  =  EZ-ooh(i)  0, 
and  e(n)  is  zero  mean,  independently  and  identically 
distributed  (i.i.d.),  non-Gaussian  process  with  A:th  {k  > 
2)  order  cumulant  denoted  by  'jke  {jke  0); 

ASa2)  The  noises  Wi{n)  and  W2{n)  are  zero  mean 
possibly  spatially  correlated  stationary  Gaussian  pro¬ 
cesses; 

ASa3)  s{n)  is  statistically  independent  of  wi{n)  and 

W2{n). 

Our  task  is  to  estimate  the  multipath  time  delays, 
Dj,j  =  1,  •  •  • ,  M,  from  x{n)  and  y{n)  only. 

2,2.  New  Normal  Equations 

Let  Ck,x{nr  ’  ^  iTk-i)  denote  the  A:th  order  cumulant 
of  x{n) 

Ck,x{n,  •  * 

=  cum{x{n),x{n  +  n),  •  •  •  ,x{n  -h  Tk-i)}  (4) 


respectively,  where  a23  =  72e/[73e^(0)],  and  a24  — 
72e/[74e(J^(0))^].  In  practical  computation,  “oo”  and 
C3,s  (05^4, 5(0  in  (7)  and  (8)  are,  respectively,  replaced 
by  a  positive  integer  L  and  C3,s(*)5 C4,s(*)j  which  are 
computed  from  N  data  samples.  It  is  known  that 
the  estimate  C2,s{ti)  is  an  asymptotically  unbiased 
and  strongly  consistent  estimate  of  the  true  autocor¬ 
relation  of  s(n)  (within  a  scale  factor),  when  L  = 
O(ArV(4+<5))^j>0. 

It  is  well  known  that  the  delayed  signal,  s{n  -  Dj),  can 
be  expressed  approximately  as 

p 

s{n-Dj)=  ^  sinc{i  -  Dj)s{n -i)  (9) 

i=-p 

where  sinc{v)  :=  sin{7rv)  /  (nv) ,  and  P  > 

7nax{Di,  •  •  •  jDm)-  Prom  (1)  and  (2),  we  have 

^k^xyi'^l  5  ’  ’  ■  5  '^k—l) 

P  M 

=  'Y^AjSinc{i- Dj) 

i=-Pj=i 

xcjfe,x(ri,---,rfc_2,T*,_i -i)  (10) 

because  of  the  Gaussianarity  of  the  noises.  By  letting 
L  L 

rxiTk-l)  =  ^  Cj:,x(Tl,---,Tfc_i)  (11) 

T1——L  rfc_2=— ■£' 


and  Ck^xyin  >  * '  ■ » )  the  fcth  order  cross  cumulant  of 
x{n)  and  y{n) 

Ck^xyi"^!^  *  '  ■j'Tfc— l) 

=  cum{x{n),'-- ,x{n-\-rk~2)jy{T^-^'^~k~i))  (5) 

where  cum{-)  denote  the  cumulant  operator.  Under 
ASal),  a  simple  relationship  between  the  higher-  and 
lower-order  cumulant s  of  s(n),  which  is  called  the  pro¬ 
jection  property  of  cumulants,  is  given  by,  for  k>  I, 

^lyS  (^1  )  *  ’  '5  —  1  ) 

00  OO 

-  Q-Ik  ■■■  Yl  ■  •,'rk-i)  (6) 

Ti  =  -00  Tfc_i  =  -“00 

where  aik  =  'iuhke[H{G)]^~^ .  Especially,  when  I  =  2 
and  A:  =  3, 4,  we  have 

oo 

C2,«(Tl)  =  a23  Yl  ^3.«(n,'r2)  (7) 

r2=— OO 

and 

OO  oo 

C2,s('ri)  =  «24  Yl  Yl  ‘^4,s(n,T2,T3),  (8) 

r2=— OO  r3=— oo 


and 


F  xy 


L  L 

i'T'k-l)  —  ***  Cfc,xi/(Tl  5  •  *  *  > '^fe-l) 

T\  —  ~L  Tks-2  =  —  L 


it  follows  that  from  (10)  and  (6) 


(12) 


P  M 

=  Yl  ^  Ajsinc{i  -  Dj)rx{T  -  i) 

i=-pj=i 


(13) 


or 


where 


p 

'^xy{T’)=  ^  aira:(r-i)  (14) 

M 

ai  =  ^  AjSinc{i  —  Dj)  (15) 

j=i 


Eqn.  (14)  forms  the  new  normal  equations  for  time  de¬ 
lay  estimation  using  higher  order  statistics.  Concate¬ 
nating  (14)  for  r  =  -P,  •  •  ‘ ,  P,  we  obtain  the  following 
matrix  equation 

Ax  =  b  (16) 
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where 


3.2.  Theoretical  Results 


A  =  {rxii- 

(17) 

X  =  [a-p,  ■ 

(18) 

=  ['f'xyi~P)i 

■■■,rxyiP)f 

(19) 

Note  that  from  (11),  (16)  and  (6),  A  is  equal  to  a  non¬ 
singular  autocorrelation  matrix  (within  some  non-zero 
constant  scale),  which  guarantees  the  uniqueness  of  the 
solution  to  the  new  normal  equations.  If  Dj^s  are  all 
integer,  then  ai  =  Aj,  for  i  =  DjJ  =  1, 2, » •  • ,  M  and 
Qi  =  0  for  others;  otherwise,  interpolation  computa¬ 
tions  based  upon  (15)  can  be  employed  for  determining 

2.3.  Algorithm  1 

Algorithm  1  for  TDE  in  NSGN  case  is  now  summa¬ 
rized  as  follows: 

(1)  estimate  Ck,x{')  and  Ck,xy{‘)  from  x{n)  and 
y(n),  n  =  0,l,-**,iV; 

(2)  estimate  ra;(-)  and  ra;y(’)  using  (11)  and  (12); 

(3)  estimate  i  —  -P,  •  •  •  ,P  by  solving  the  matrix 
equation  of  (15); 

(4)  choose  the  time  delays  Dj’s  and  the  attenuations 
Aj’s  from  the  definition  of  a^’s. 

3.  TDE  in  GSNN  Case 

In  this  section,  we  study  the  problem  of  TDE  of 
Gaussian  signal  in  spatially  correlated  non-Gaussian 
noises. 

3.1.  Model  Assumptions 

The  measurement  models  are  the  same  as  (1)  and 
(2),  while  the  conditions  on  the  models  are  as  follows: 
ASbl)  The  signal  s{n)  is  a  zeromean,  stationary  Gaus¬ 
sian  process. 

ASb2)  The  noises  wi{n)  and  W2{n)  are  spatially  corre¬ 
lated  non-Gaussian  processes  with  non-zero  kth  {k  > 
2)  cumulants,  and  W2{n)  can  be  generated  by  passing 
wi{n)  through  a  linear  finite  dimensional  asymptoti¬ 
cally  stable  system  with  impulse  response  gi^  i.e., 

Q 

^2(n)  =  ^  9iWi{n-i)]  (20) 

i=-Q 

in  addition,  ^  ^  1^1  >  1- 

ASb3)  s(n)  is  independent  of  ti;i(n)  and  W2{n). 


Prom  the  above  assumptions,  the  measurement 
models  (1)  and  (2)  can  be  rewritten  as 

Q 

x{n)  =  s{n)  -h  ^  giw{n  -  i),  (21) 

i=—Q 

P 

y{n)  =  ^  ais{n  —  i)  +  w{n).  (22) 

i=~p 

The  problem  is  to  estimate  a^’s,  thus  the  time  delays 
Dj's  from  x{n)  and  y{n)  only.  Note  this  problem  is 
similar  to  that  of  multichannel  signal  separation  in  [6], 
the  differences  lie  in  the  probability  distributions  of  the 
two  components  of  s{n)  and  w{n),  where  in  [6]  s{n)  and 
w{ti)  are  all  non-Gaussian,  and  in  this  section  s(n)  is 
Gaussian  while  w{n)  is  non-Gaussian.  It  is  pointed  out 
that  the  separation  approaches  developed  in  [6]  are  not 
applicable  to  our  case  since  the  coefficient  matrix  to  the 
related  normal  eauations  are  zero  and  obviously  singu¬ 
lar;  however,  the  hybrid  approach  developed  in  this 
section  is  also  applicable  for  multichannel  signal  sepa¬ 
ration. 

Note  that  the  signal  s{n)  is  Gaussian  and  the  noises 
parts  wi  (n)  and  W2  (n)  are  non-Gaussian,  we  may  use 
the  method  developed  in  the  previous  section  to  esti¬ 
mate  ^i’s  first.  Comparing  (1)  and  (2)  with  (21)  and 
(22),  we  have 

Q 

rpx(T)=  ffirp(r-i)  (23) 

i=-Q 

where  Vyxi')  and  ry(-)  are  similar  to  rxy{*)  and  rx{') 
as  in  (11)  and  (12),  respectively.  Concatenating  (23) 
for  r  =  —  Q, -’-jQ,  we  may  obtain  the  unique  solu¬ 
tion  of  gi^s  since  the  corresponding  coefficient  matrix 
is  nonsingular.  Once  y^’s  are  available,  one  may  use 
the  cross-correlation  based  approach  to  estimate  a^’s. 
To  this  end,  we  prefilter  y(n)  in  (22)  as 

Q 

yin)  =  XI  aiVin-j) 

j=-Q 

P  Q 

=  X  X  nigjs{n  -  i  -  j) 

i=-Pj=-Q 

Q 

+  X  9Mn-j)  (24) 

j=-Q 

Combination  of  (21)  and  (24)  yields 

y{n)  =  x{n)  -  y{n) 
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P  Q 

=  s(n)  -  X)  '^idjsin  -i-j) 

i=-Pj=~Q 

P+Q 

=  ^  bis{n  -  i)  (25) 

i=-P-Q 

where 

hi  =  S{i)  —  ai^Qi,  i  =  —P  —  Q  (26) 

and  denotes  convolution  computation. 

Prom  (21)  and  (25),  it  is  easy  to  show  that 

P+Q 

^y('^)  =  birxyir-\-i)  (27) 

i=^P-Q 

Using  the  above  normal  equations,  we  can  estimate 
and  thus  estimate  a^’s  from  (26). 

3.3.  Algorithm  2 

Algorithm  2  for  TDE  in  GSNN  case  is  now  summa¬ 
rized  as  follows: 

(1)  estimate  Cfc,j/(0  and  Ck,yx{')i  thus  fy{*)  and  fyxi') 
horn  x{n)  and  2/(n),  n  =  0, 1,  •  *  •  ,iV; 

(2)  estimate  gi,  i  =  -Q,  •  •  • ,  Q  by  solving  the  normal 
equation  of  (23); 

(3)  use  (24)  and  (25)  to  obtain  y(n),  compute  r^(*)  and 
fxy{')i  and  then  estimate  6i’s  using  (27); 

(4)  estimate  a^’s  from  (26),  and  thus  Dj^s  and  Aj^s. 

4.  Simulations  and  Conclusions 

Two  simulation  examples  are  presented  to  illustrate 
the  effectiveness  of  the  new  algorithms  of  this  paper. 
In  the  simulations,  we  choose  M  =  2,  Z)i=2,  D2  =  5, 
Ai  =  0.8,  A2  =  0,6,  and  N  =  2000,  and  define  SNR  = 
10logio{E[s‘^{n)/E[wl{n)]}. 

Example  1{NSGN  case):  In  (1)  and  (2),  the  signal 
and  noises  are  generated  by 

s{n)  =  u{n)  +  u{n  —  1)  (28) 

wi{n)  =  W2{n)  =  e(n)  H- 0.8e(n  —  1)  (29) 

respectively.  Here,  u{n)  and  e(n)  are  i.i.d.  exponential 
and  Gaussian  processes,  respectively.  Figs. 1-3  show 
the  estimated  a^’s,  i  =  -10, •••,10,  obtained  by  tra¬ 
ditional  approach  [1],  Tugnait’s  approach  [4],  and  our 
Algorithm  1  {SNR  =  2.5dB,  20  Monte  Carlo  runs). 
Example  2{GSNNcase):  The  signal  s{n)  and  and 
noise  W2{n)  are  the  same  as  in  (28)  and  (29),  while 
wi{n)  =  W2{n)  -  O.Sw2{n  -  1),  and  u{n)  and  e(n) 


are  i.i.d.  Gaussian  and  exponential  processes,  respec¬ 
tively.  The  corresponding  results  are  plotted  in  Figs. 4- 
6  {SNR  =  -2.5dB,  20  Monte  Carlo  runs). 

From  the  results  shown  in  Figs. 1-6,  we  have  the  follow¬ 
ing  observations. 

(1)  The  tradictional  approach  always  produce  three 
peaks,  with  two  of  them  due  to  the  signal  delays,  while 
the  other  due  to  the  correlatedness  of  the  noises.  How¬ 
ever,  the  true  time  delays  cannot  be  correctly  picked 
out  from  the  ambiguous  peaks. 

(2)  For  the  NSGN  case,  although  the  Tugnait’s  ap¬ 
proach  can  give  the  correct  time  delays,  the  estima¬ 
tion  results  are  with  higher  variances  as  compared  with 
those  obtained  by  the  Algorithm  1.  For  the  GSNN 
case,  the  Tugnait’s  approach  fail  to  produce  the  true 
time  delays  of  the  Gaussian  signals. 

(3)  On  the  other  hand,  our  Algorithm  1  produces  cor¬ 
rect  time  delay  estimates  in  the  NSGN  case,  while 
Algorithm  2  works  well  in  the  GSNN  case.  It  is 
pointed  out  that  although  the  SNR  is  relatively  low 
in  Example  2,  the  Algorithm  2  performs  quite  well 
since  it  is  finally  implemented  in  the  domain  of  second 
order  statistics,  which  is  powerful  in  suppressing  the 
noises. 

References 

[1]  G.  C.  Carter,  “Time  delay  estimation  for  passive 
sonar  signal  processing,”  IEEE  Trans.  Acoust., 
Speech,  Signal  Processing,  1981,  vol.  29,  No.  6, 
pp.463-470. 

[2]  C.  L.  Nikias  and  R.  Pan,  “Time  delay  estimation 
in  unknown  Gaussian  spatially  correlated  noise 
processing”,  IEEE  Trans.  Acoust.,  Speech,  Signal 
Processing,  vol.36.  No.  11,  1988,  pp. 1706-1714. 

[3]  J.  K.  Tugnait,  “On  time  delay  estimation  with 
unknown  spatially  correlated  Gaussian  noise  us¬ 
ing  fourth-order  cumulants  and  cross  cumulants,” 
IEEE  Trans.  Signal  Processing,  vol.39,  No.  6, 
1991,  pp.1258-1267. 

[4]  M.  J.  Hinich,  D.  Marandino  and  E.  J.  Sullivan, 
“Bispectrum  of  ship-radiated  noise,”  J.  Acoust. 
Soc.  Amer.,  vol.85,  pp.1512-1517,  Apr.  1989. 

[5]  S.  A.  Kassam,  Signal  Detection  in  Non-Gaussian 
Noise,  New  York:  Springer- Verlag,  1988. 

[6]  D.  Yellin  and  E.  Weinstein,  “Criteria  for  multi¬ 
channel  signal  separation”,  IEEE  Trans.  Signal 
Processing,  vol. 42,  No. 8,  1994,  pp. 2 158-2 167. 


12 


Figure  1.  The  estimated  aiS  obtained  via  tra¬ 
ditional  approach  (N  =  2000,  SNR  =  2.5dB, 
20  Monte  Carlo  runs) 


Figure  4.  The  estimated  a^s  obtained  via  tra¬ 
ditional  approach  (AT  =  2000,  SNR  =  -2.5dB, 
20  Monte  Carlo  runs) 


Figure  2.  The  estimated  a^s  obtained  via  Tug- 
nait’s  approach  (AT  =  2000,  SNR  =  2.5dB,  20 
Monte  Carlo  runs) 


Figure  5.  The  estimated  a^s  obtained  via  TUg- 
nalt’s  approach  (AT  =  2000,  SNR  =  -2.5dB, 
20  Monte  Carlo  runs) 


Figure  3.  The  estimated  UiS  obtained  via 

Algorithm  1  {N  =  2000,  SNR  =  2.5dB,  20 
Monte  Carlo  runs) 


Figure  6.  The  estimated  ajS  obtained  via 

Algorithm  2  {N  =  2000,  SNR  =  — 2.5dB,  20 

Monte  Carlo  runs) 
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Abstract 

The  analysis  of  seismicity  need  to  be  restricted  to 
earthquakes.  The  estimated  seismic  energy  release 
and  the  spatial  distribution  of  seismic  events  will  be 
erroneous  and  biased  if  the  data  catalogue  includes 
artificial  events  such  as  explosions.  Therefore,  it  is 
necessary  to  separate  earthquakes  and  explosions 
prior  to  the  compilation  of  the  seismicity  in  a 
particular  region.  In  the  canal  area  of  Panama,  the 
events  are  equally  distributed  with  signal  energy 
contents  in  the  same  range,  making  localization  and 
magnitude  estimation  not  effective  as  discriminators. 
In  this  study  we  use  a  method  based  on  master-event 
correlation 's  for  filtering  out  the  explosions  in  the 
seismological  catalogue.  We  use  a  library  of  time 
series  with  the  a  priori  knowledge  of  the  source  type. 
The  method  is  based  on  comparison  of  unknown 
events  with  the  library  by  second,  third  and  fourth- 
order  cross-correlations.  Studied  time  series  are 
divided  into  three  parts,  the  first  phases  of  the  P  and 
S-wave  and  a  window  from  the  preceding  noise. 
Twenty  known  and  twenty-one  unknown  events  are 
tested  with  master-event  correlation  in  second,  third 
and  fourth-order  domain.  The  second-order  based 
method  discriminate  correctly  40  %  of  the  known 
events  while  the  third  and  fourth-order  based 
correlations  succeed  in  75  %  respectively  80%  of  the 
known  cases. 

1.  Introduction 

We  examine  the  problem  of  discrimination  between 
local  chemical  explosions  and  earthquakes  in  the 
canal  area  of  Panama  by  second,  third  and  fourth- 
order  master-event  cross-correlations.  Regional  and 
local  seismic  sources  exhibit  considerably  large 
contents  in  both  higher-order  spectra  and  statistics. 


e.g.  [1]  and  [2].  Chemical  explosions  generated  in  a 
pre-stressed  and  fracturated  region  as  around  the 
canal  area  of  Panama  in  Central  America  are  similar 
in  the  seismic  energy  contents  as  shallow  earthquakes 
making  conventional  discrimination  methods 
difficult.  Also,  localization  as  a  method  for 
identifying  sources  are  not  effective  because  the 
small  distinction  in  the  spatial  distribution  of  the 
events.  The  additional  information  in  the  third  and 
fourth-order  correlation  domain’s  are  useful  for  the 
possibilities  of  a  reduction  of  the  misclassfication. 

2.  Seismic  data 

The  number  of  man-made  seismic  events  in  the 
region  of  central  Panama  are  high  due  to  the  activity 
of  expanding  the  water  depth  and  the  size  of  the 
canal.  Charges  as  large  as  1800  kg  of  explosives  are 
used  in  a  daily  basis.  During  a  week  several  hundred 
events  are  made  at  the  bottom  of  the  canal  and  on 
land.  In  the  area  it  is  also  a  large  number  of  events 
caused  by  road  works,  quarries  and  military  activity 
both  on  land  and  in  the  water.  Some  of  the  explosions 
are  reported  to  the  seismological  unit  in  Panama  city 
but  many  events  are  not.  In  Figure  1  a  typical  pattern 
of  the  seismicity  of  a  four  year  period  is  displayed. 
The  estimation  of  many  seismological  quantities  such 
as  seismic  hazard,  seismicity  and  energy  release 
distribution  are  impossible  with  a  data  base  polluted 
by  artificial  events.  Therefore,  it  is  important  on  a 
routine  basis  to  discriminate  between  natural  and 
man-made  events.  Table  I  and  n  present  some 
parameter’s  for  the  used  events  in  the  master-event 
library  and  Table  HI  is  some  parameter’s  for  the 
unknown  events.  The  events  are  all  in  the  same 
magnitude-energy  range  and  in  a  distance  range  from 
6  km  to  74  km  from  the  recording  station  UP  A. 


0>8 186-8005-9/97  $10,00  ©  1997  IEEE 
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Rgure  1 .  Seismicity  map  including  earthquakes  and  man-made  events  in  the^  central  of  Panama  from  the 
period  of  May  1992  to  May  1996.  The  size  of  the  circles  are  related  to  magnitude  of  the  events.  CD  means 
Canal  discontinuity,  and  UPA,  EC02,  LGT,  BYN  and  ACH  are  seismological  stations. 


No. 

Date 

Year  Origin 

M/D  Time 

Epicenter 
Lat  Long. 
°N  °W 

Mag. 

Dist. 

fUPA 

km 

Az. 

deg. 

1 

1994 

May  20 

23:01:07 

9.10 

79.76 

2.8 

28 

298 

2 

Sep  28 

00:33:03 

8.79 

79.82 

2.8 

39 

237 

3 

1995 

Mar  01 

07:36:06 

8.93 

79.61 

2.0 

10 

232 

4 

May  12 

14:31:56 

9.05 

79.65 

2.9 

15 

301 

5 

May  19 

16:43:25 

9.15 

79.79 

2.9 

30 

303 

6 

May  23 

02:12:20 

9.36 

79.98 

2.8 

64 

289 

7 

Jun.  06 

21:51:07 

9.40 

79.01 

3.3 

74 

52 

8 

1996 

Jan  26 

12:47:08 

9.07 

79.02 

3.2 

57 

80 

9 

May  02 

10:38:00 

9.00 

79.26 

3.3 

30 

85 

10 

May  06 

18:03:11 

9.16 

79.65 

3.1 

23 

320 

11 

May  11 

07:57:28 

9.00 

79.26 

3.1 

30 

83 

12 

May  16 

16:20:07 

9.08 

79.69 

3.2 

20 

305 

Table  I.  Earthquakes  used  in  the  master  event  library.  Epicenter  -  Lat., 
Long.,  means  estimated  localization.  Mag.  means  the  duration  magnitude 
and,  distance  and  azimuth  means  position  relative  to  the  UPA  station. 
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No. 

Date 

Year 

M/D 

Origin 

Time 

Ripp 

no. 

Epicenter 

Lat.  Long. 

°N  °W 

Charge 

kg 

Mag. 

Dist.  Az. 
from  UPA 
km  deg. 

1995 

13 

Feb.  06 

11:09:18 

3 

9.07 

79.69 

158.7 

2.6 

20 

302 

:09:19 

158.7 

:09:20 

158.7 

14 

Feb.  07 

11:25:58 

2 

9.16 

79.71 

158.8 

2.7 

28 

303 

:26:00 

158.7 

15 

Feb.  10 

12:09:19 

3 

9.07 

79.73 

108.8 

2.7 

24 

298 

:09:19 

108.8 

:09:20 

108.8 

16 

Feb.  14 

13:08:12 

3 

9.07 

79.68 

79.4 

2.7 

19 

303 

:08:19 

79.4 

:08:22 

79.4 

17 

Feb. 17 

11:23:59 

3 

9.10 

79.70 

174.6 

3.0 

20 

304 

:26:47 

174.6 

:26:57 

174.6 

18 

Feb.  24 

14:17:17 

3 

9.13 

79.70 

158,7 

2.8 

24 

308 

:17:18 

174.8 

:17:29 

174.8 

19 

Jun.  16 

16:00:41 

7 

9.08 

79.72 

254.0 

2.8 

23 

300 

:00:42 

254.0 

:00:44 

254.0 

:00:45 

254.0 

:00:47 

254.0 

:00:49 

254.0 

:00:50 

254.0 

20 

Sep.  15 

17:17:12 

1 

9.01 

79.69 

269.8 

2.8 

17 

283 

Table  II.  Explosions  used  in  the  master  event  library.  Epicenter  -  Lat.,  Long.,  means 
estimated  localization  of  the  events.  Ripple  means  the  number  of  charges.  Mag.  means  the 
duration  magnitude  and,  distance  and  azimuth  means  position  relative  to  the  UFA  station. 


Date 

Epicenter 

Dist. 

Az. 

No. 

Year 

Ori^n 

Lat. 

Long. 

Mag 

UPA 

M/D 

Time 

“W 

km 

deg. 

1994 

pi 

Dec  29 

22:08:57 

9.07 

79.66 

2.5 

17 

306 

1996 

p2 

Jan  05 

19:18:29 

9.04 

79.56 

2.2 

6 

351 

p3 

Jan  06 

16:36:30 

9.02 

79.63 

2.3 

12 

286 

p4 

Jan  08 

21:38:30 

9.10 

79.58 

1.9 

14 

345 

p5 

Jan  12 

19:43:46 

9.00 

79.60 

2.2 

7 

284 

p6 

Jan  12 

19:59:57 

9.00 

79.60 

1,6 

8 

280 

P7 

Jan  19 

17:19:49 

9.16 

79.54 

2.7 

19 

352 

p8 

Jan  24 

18:55:55 

9.02 

79.57 

1.9 

6 

320 

p9 

Jan  25 

11:40:58 

9.35 

79.96 

3.1 

62 

306 

plO 

Jan  25 

19:56:27 

9.13 

79.65 

2.0 

20 

325 

pH 

Jan  26 

11:03:25 

9.13 

79.78 

2.8 

33 

288 

pl2 

Jan  26 

19:12:47 

8.86 

79.70 

1.9 

23 

234 

pl3 

Feb.  02 

13:10:37 

9.12 

79.67 

2.7 

21 

305 

pl4 

Feb.  02 

16:31:08 

9.18 

79.63 

3.0 

24 

333 

pl5 

Feb.  02 

16:44:08 

9.06 

79.63 

2.1 

14 

307 

pl6 

Feb.  02 

18:16:03 

9.06 

79.60 

2.0 

12 

320 

pl7 

Feb.  05 

11:28:12 

9.03 

79.80 

2.6 

30 

280 

pl8 

Feb.  05 

17:41:48 

9.23 

79.86 

2.7 

46 

300 

pl9 

Feb.  07 

19:44:53 

9.06 

79.62 

1.9 

13 

311 

p20 

Feb.  08 

17:10:15 

9.07 

79.64 

2.2 

16 

311 

p21 

Feb.  12 

11:58:12 

9.14 

79.71 

2.8 

26 

305 

Table  III.  Unknown  events  used  in  the  analysis.  Epicenter  -  Lat.,  Long.,  means  estimated 
localization  of  the  events.  Mag.  means  the  duration  magnitude  and,  distance  and  azimuth 
means  position  relative  to  the  UPA  station. 
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3.  Master-event  correlation  analysis 

We  performed  master-event  correlation  analysis  by 
using  a  library  of  known  earthquakes  and  explosions 
to  selected  unknown  events  in  the  second,  third  and 
fourth-order  cross-correlation  domain’s.  The 
measures  are  all  possible  non-redundant  cross¬ 
correlation  combinations  for  each  data  window 
between  the  unknown  events  and  the  library  events. 
The  data  windows  are  matched  along  the  time  axis 
and  the  maximum  correlation  value  with  the  main 
part  of  the  information  is  used  for  the  analysis.  Some 
authors  e.g.  [3]  use  an  interval  determined  by  the 
maximum  time  between  coherent  phases.  The 
complexity  in  both  P  and  S-phases  in  our  time  series 
restrict  us  to  use  only  the  lag  for  maximum 
correlation.  The  conventional  second-order  cross¬ 
correlation  between  the  time  series  of  the  unknown 
event  x(t)  and  the  time  series  of  one  of  the  known 
master  events  y(t)  is  defined  as, 
j  N+a 

(!) 

tz=ia 

The  size  N  and  the  position  of  the  data  window  a 
need  to  be  matched  to  the  particular  phase  of 
interest.  In  this  study,  the  time  series  of  the  master 
and  the  unknown  event  are  divided  into  three  data 
windows;  I)  preceding  noise  only,  2)  the  first  arrived 
phase,  normally  the  P-wave  and  3)  the  second  large 
phase,  normally  the  S-wave.  We  perform  the 
analysis  at  lag  x  with  the  maximum  correlation 
value.  The  analysis  of  the  noise  window  serve  as  a 
reference.  Similar  to  the  different  properties  in  the 
bispectrum  and  the  trispectrum  compared  to  power 
spectrum  the  third  and  fourth-order  cross¬ 
correlations  measure  other  similarities  than  the 
amplitudes,  [4].  Also,  other  possible  combinations  of 
correlation’s  occur  between  the  known  master  events 
y(t)  and  the  tested  event  x(t)  and  is  useful  in  the 
discrimination  analysis.  The  third-order  cross¬ 
correlation’s  are  defined  as, 

^x(t)x(t  +  rj)y(t  +  T2)  (2) 

^  t^a 
and 

y  N-^a 

m%y(Ti,T2)  =—  ^x(t)y(t  +  Ti)y(t  +  (3) 

^  t^a 

The  fourth-order  cross-correlation  can  be  defined  in 
three  different  non-redundant  combinations  as, 

^  iV+a 

=  —  '^x(t)x(t-^Ti)x(t-^T2)y(t-^Ti),  (4) 

^  t=a 


j  N+a 

t)x( t  +  Tj)y(  t  +  r2)y( t  +  Tj),  (5) 

t=a 

and 

miyyy('^h'^2^'^3) 

j  N+(x 

=  —  'yx(t)y(t  +  Tj)y(t  +  r2)y(t  +  t^) .  (6) 

N  , 

t=a 

The  cross-correlations  in  equation  (1)  to  (6)  of  the 
unknown  events  x(t)  are  performed  to  all  the  library 
events  y(t)  and  gives  values  in  clusters.  The  cluster’s 
are  compared  to  both  the  earthquake  and  the 
explosion  ‘master’  cluster  composed  of  library 
events  only.  The  discrimination  space  for  the 
second-order  cross-correlation  values  in  equation  (1) 
is  spanned  by  both  the  correlation  of  the  first  and  the 
second  data  window.  This  is  not  necessaiy  for  the 
third-order  correlation  values  spanned  by  equation 
(2)  and  (3)  and  the  fourth-order  estimates,  equation 
(4)  to  (6).  The  shortest  squared  Mahalanobis 
distance  of  the  unknown  events  define  if  the 
particular  event  is  an  explosion  or  an  earthquake.  If 
the  clusters  overlap  with  50  %  or  more  the  method 
failed  and  the  cross-correlation  values  are  not  used 
in  the  analysis. 

4.  Results 

The  procedure  begins  with  defining  the  position  a 
and  size  (N)  of  data  window  for  the  noise,  P  and  S- 
wave.  Independent  P  and  S-wave  parts  (vertical  and 
horizontal  components)  are  averaged  into  groups, 
numerically  quantified  in  clusters,  and  displayed  in 
diagrams,  e.g.  Figure  2.  The  analyses  are 
demonstrated  by  the  cross-correlation  technique 
described  above  with  a)  second-order  (m2xy),  b) 
third-order  (m3xyy)  and  c)  fourth-order  (m4xxyy, 
m4xxxy  and  m4xyyy).  The  measures  used  are 
defined  in  equations  (1)  to  (6).  The  cross-values 
between  the  master  events,  earthquakes  (open 
circles),  explosions  (crosses)  and  tiie  unknown 
events  (star)  are  measured  and  the  shortest  squared 
Mahalanobis  distances  for  each  group  define  the 
discrimination  of  the  event.  The  &st  analysis  is  to 
test  each  of  the  master  events  to  the  library.  The 
second-order  based  method  discriminate  correctly  40 
%  of  the  known  events  while  the  third  and  fourth- 
order  based  correlation’s  succeed  in  75  % 
respectively  80  %  of  the  cases.  The  results  of  the 
analysis  by  using  8  explosions  and  12  earthquakes  as 
known  master  events  in  a  library  and  21  unknown 
events  are  compiled  in  Table  IV.  For  the  unknown 
events  the  analysis  of  second,  third  and  fourth-order 
measures  succeed  to  declare  11  explosions  and  8 
earthquakes.  Two  of  the  events  disagree  between 
the  vertical  P  and  the  horizontal  S-waves. 
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m4xxyy  m4xxyy  m3xyy  phase2 


earthq. 


expl. 


3 
2 
1 
0 

0  12  3 
m3x5^^  q5 


b 


xIO^^P'-  b 


m3x)^105 


m4xx^y,Q7  m4xx^Q7  m4xy^Q7 


Figure  2.  Examples  of  analysis  of  an  unknown  event  (star)  compared  with  the  master  events,  earthquakes  (open 
circles)  and  explosions  (crosses)  by  a)  second  order,  b)  third  order  and  c)  fourth-order  cross-correlation  measures.  The 
unknown  event  is  pi  in  Table  IV  and  is  estimated  to  be  an  earthquake. 


18 


No 

P+S- 

m2 

p 

m3 

P- 

m4 

s- 

m3 

S- 

m4 

Res. 

P  S 

pi 

ex 

ex 

ex 

ex 

eq 

ex 

ex 

p2 

eq 

eq 

eq 

eq 

ex 

eq 

eq 

P3 

ex 

ex 

ex 

ex 

ex 

ex 

ex 

p4 

eq 

eq 

eq 

eq 

ex 

eq 

eq 

P5 

eq 

eq 

eq 

eq 

eq 

eq 

eq 

p6 

ex 

ex 
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Table  iV.  Analysis  of  unknown  events  used  in  the 
analysis.  Abbreviation  ex.  means  explosion  and  eq. 
earthquake. 


has  a  robust  physical  meaning  using  real  waveform 
records. 

The  seismic  catalogue  at  UFA  contains  a  high 
number  of  presumed  explosions.  The  seismicity 
show  that  after  semi-automatic  identification  and 
removing  of  the  hidden  explosions,  the  hypothesis 
that  assumes  an  active  single  lineament  of  the  Canal 
Discontinuity  fault  becomes  rather  weak.  This 
method  can  assist  seismologists  in  the  region  to 
improve  their  identification  of  seismological  data. 
Therefore,  the  higher-order  based  master-event 
correlation  technique  is  suggested  to  be  used  before 
any  investigation  or  conclusion  about  local 
seismicity,  and  is  recommended  for  future  routine 
analysis  at  other  seismological  observatories. 
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5.  Discussion  and  Conclusions 

At  present,  access  to  digitized  waveforms  permits 
fast  evaluation  of  signal  source  parameters 
applicable  in  the  analysis  of  weak  and  low  energy 
seismicity  (like  natural  fault-ruptures  or  man-made 
events).  In  the  classification  of  seismic  sources,  we 
use  the  quantitative  appearance  of  seismograms  in 
the  higher-order  cross-correlation  domain,  instead  of 
conventional  statistical  processing  of  catalogues, 
contrast  in  magnitude  units,  or  polarity  signs  and 
envelope  and  energy  contents  in  the  time  series. 

We  describe  a  method  to  register  and  discriminate 
industrial  explosions  in  the  canal  region  of  Panama 
against  low-magnitude  local  earthquakes.  The 
higher-order  based  master-event  cross-correlation 
method  is  introduced  and  its  potential  for  new 
applications  in  observational  seismology  is 
examined.  The  study  comprises  of  21  unknown 
events  compared  to  a  library  of  8  explosions  and  12 
earthquakes,  inside  the  canal  area  of  Panama. 
Discriminations  by  second-,  third-  and  fourth-order 
succeed  in  11  explosions  and  8  earthquakes.  Four  of 
the  events  disagree  between  the  vertical  P  and  the 
horizontal  S-waves.  One  explanation  for  this  might 
be  wrong  phase  identification,  mxiltiple  events  or  the 
library  is  to  small  to  cover  all  the  waveform 
characteristics.  Another  possible  source  for  errors  is 
the  position  and  size  of  the  data  window  and  the 
matching  of  the  compared  phases  in  each  window. 
Results  suggest  that  there  are  no  influence  in  the  size 
of  the  events  (magnitudes),  and  the  method  seems  to 
be  insensible  to  azimuth  or  distance  fluctuation 
without  any  corrections  of  the  incoming  waves. 
From  a  pure  seismological  point  of  view,  the  method 
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ABSTRACT 

In  recent  years,  time  delay  estimation  TDE  proces¬ 
sors  for  non-Gaussian  conditions  were  developed  and  it 
has  been  shown  that  exploiting  the  non-Gaussianity  of 
the  processes  can  lead  to  improved  performance,  rela¬ 
tive  to  the  Gaussian  case.  However,  while  for  the  later 
standard  tools  (as  the  Cramer-Rao  bound)  can  easily 
be  applied  to  predict  the  achievable  performance,  for 
the  general,  non-Gaussian  case  using  such  tools  rarely 
leads  to  analytical  solutions.  In  this  paper  we  present 
a  close-form  expression  for  a  figure  of  merit  FOM  for 
any  TDE  processor  under  any  statistical  model.  It 
is  then  evaluated  for  different  TDE  processors  and  it 
is  confirmed  (by  simulations)  that  the  proposed  FOM 
accurately  predicts  the  performance  of  these  processors 
for  reasonable  values  of  time-bandwidth  product. 

1.  INTRODUCTION 

Time  delay  estimation  {TDE)  is  a  problem  of  appli¬ 
cations  in  many  areas  as:  radar  and  sonar,  communi¬ 
cation,  bio-medical  engineering,  etc.  This  problem  has 
been  deeply  studied  for  the  case  where  the  additive 
noise  process  is  Gaussian  [1].  For  this  case,  the  opti¬ 
mal  processor  and  the  achievable  performance  are  well 
understood.  There  are  applications,  however,  where 
the  noise  and/or  the  signal  are  non-Gaussian.  In  re¬ 
cent  years,  TDE  processors  for  non-Gaussian  condi¬ 
tions  were  developed.  In  general,  exploiting  the  non- 
Gaussianity  of  the  processes  can  lead  to  improved  per¬ 
formance,  relative  to  the  Gaussian  case.  However,  while 
for  the  Gaussian  case  standard  tools  (as  the  Cramer- 
Rao  bound)  can  easily  be  applied  to  predict  the  achiev¬ 
able  performance,  for  the  general,  non-Gaussian  case 
using  such  tools  rarely  leads  to  analytical  solutions. 

In  this  paper  we  present  a  close-form  expression  for 
a  figure  of  merit  {FOM)  for  any  TDE  processor  un¬ 
der  any  statistical  model.  It  has  been  developed  under 


a  simplified  version  of  the  TDE  problem,  namely:  a 
discrete,  iid  model.  For  this  case,  simulation  results 
show  very  good  agreement  with  the  theoretical  results. 
However,  as  we  will  show  in  the  sequel,  TDE  proces¬ 
sors  for  the  iid  case  can  be  used  with  non-iid  signals 
with  performance  degradation  which  is  proportional  to 
the  amount  of  correlation  in  the  processes. 

Consider  two  discrete  signals: 

a;i(f)  =  5(2)  +  ni(2)  ;  1  <  2  <  iV 

X2{i)  =  s(2  —  D) -h  ^2(2)  ;  1  <  2  <  (1) 

D  is  an  unknown  parameter  which  can  get  one  out  of 
K  -h  1  possible  values,  0  <  D  <  K  <  N.  We  as¬ 
sume  that  221(2),  222(2)  and  s{i)  are  mutually  indepen¬ 
dent,  i,i.d  processes.  We  denote  the  probability  densi¬ 
ty  function  of  s{i)  by  fs  and  that  of  22j(2),  j  =  1,2, 
by  fn  for  ^  Our  aim  is  to  estimate  D 

given  the  two  vectors,  =  [a:i(l), iri(7V^)]^  and 

Any  TDE  processor  looks  for  the  best  matching 
between  X2{i)  and  the  delayed  signal,  xi{i  —  d)  for 
0  <  d  <  /<.  Because  of  the  i.i.d  assumption,  such 
a  processor  is  of  the  general  form:  max{zd}^::^o^  where 

N 

=  X)  0’  “  *2(i))  (2) 

i=i 

g{^,  •)  is  a  weighting  function  which  operates  on  pairs 
of  terms  from  the  two  sequences.  Different  processors 
(e.g.,  linear  or  non-linear  correlators,  minimum  nor- 
m  processors,  maximum  likelihood,  etc.)  differ  by  the 
function  g.  However,  based  on  the  assumption  that  the 
statistics  of  the  two  noise  processes  are  the  same,  it  is 
reasonable  to  assume  that  g{y,  z)  =  g{z,  y). 


0-8186-8005-9/97  $10.00  ©  1997  IEEE 


20 


2.  THE  PROPOSED  FOM 


Let  zq  —  z{d  =  D),  Zr  =  z{d  =  dr  ^  D)  and  Vr  = 
zq  —  Zr.  Then  the  probability  of  correct  decision  is: 

Pc  =  P7'{vi  >  0,V2>  0,  Vk>  —  Pt{v  >  0}  (3) 

Zi  is  the  sum  of  N  independent  random  variables  with 
finite  and  equal  second  moments.  Thus  y_  is  asymptoti¬ 
cally  normal  for  large  N .  Under  the  stated  assumption- 
s  all  Vi  have  the  same  means  and  variances:  E{vr)  = 
E{z(i)-E{zr)  =  T]o-Tji,  Var{vr)  =  Var{zo-Zr)  = 

Vr  and  Vs  also  have  the  same  correlation  coefficient 
p  >  0  for  all  r  ^  s.  It  follows  that 

E{v)  =  (ijo  -Vi)l  ;  cov{v)  =  +  (!-/?)/] 

(4) 

where  i  is  a  /^-dimensional  vector  whose  elements  are 
1  and  I  is  the  K  x  K  identity  matrix.  Let  a  and  br  be 
independent  Gaussian  random  variables 

a  =  N{'qo-r]i,p(7'^)  ,  br  =  -  p)<r^)  (5) 

then  the  their  sum  a  +  6^  has  the  statistics  of  Vr  spec¬ 
ified  by  (4)  so  one  can  think  of  Vr  as  the  sum  of  two 
components: Vr  =  a  -f  6r.  Because  of  the  independence 
of  hry  (3)  can  now  be  written  in  the  form: 

/oo 

[Pr(6i  >  -a\a)f  fa{a)da.  (6) 

-CO 


Using  (5)  one  then  obtains 


-L 


2 


K 


(7) 


where  u  =  —  p)(t^  and  erf{x)  =  e  ^^dy. 

Here  we  suggest  the  approximation:  [u;(u)]^ 

j  _j_  gr  E*  ^ 

where  u;(u)  = - ^ ^  and  ci(A')  and  C2(A)  are  pre¬ 

determined  numbers.  Under  this  approximation: 


Pc 


1  +  er/( 


y/p+cl(K)(l-p)  ^ 
2 


(8) 


The  probability  of  correct  decision,  or  any  quantity 
which  is  proportional  to  it,  is  a  reasonable  figure  of 
merit.  In  particular,  under  our  assumptions,  Pr{D  = 
d\D}  =  Pc  if  d  =  D  and  Pr{D  =  d\D}  =  4^  if 
d  ^  SO  Pc  uniquely  determines  the  mean  square 
error  (mse)  of  any  reasonable  estimator  of  D: 


mse{D} 


K 

Y,{di-DfPr{D  =  di\D} 


i=Q 

K 


Y,{di-Df 


I -Pc 

K 


=  (1  -  Po)B{m 


where  B{K)  is  a  measure  of  the  distance  between  the 
true  delay  and  the  alternatives  and  is  independent  of 
the  estimator. 

Since  the  error-function  is  a  monotonic  increasing 
function  of  its  argument,  a  proper  figure  of  merit  for 
any  processor  can  be  this  argument,  given  by  (see  (8)): 


FOM  = 


Vp  +  4iE)ii  -  p) 


(10) 


Note  that  the  FOM  has  units  of  (the  square  root  of) 
the  signal-to-noise  ratio  {yJSNR)^  as  the  common  FOM 
used  in  binary  detection  problems  [2]. 

It  can  be  shown,  that  any  given  TDE  processor  of 
the  form  of  (2)  and  under  the  stated  statistical  assump¬ 
tions,  the  FOM  (10)  becomes: 


FOM  = 


(11) 


where  Rq,  Ri  and  R^  are  evaluated  from  the  2nd  order 
moments  of  the  variables: 

9ij  =  9{s{i)  +  n2(i)),  ij  =  1,2,3: 

Ro  =  [^(^^ii)  “  -£*(^12)]^ 

Ri  ~  -^(5^12)  “  '^E{gi2gi3)  +  E^{gi2) 

R2  =  4:E{gi2gi3)  —  ^E^{gi2)  *f  E(gli)  —  E^{gii) 

“  4:E{giigi2) 4:E{gn)E{gi2)  (12) 


Note  that  the  dependence  of  the  FOM  on  N  and  on 
K  is  explicit  in  (11).  The  specific  TDE  processor  and 
the  statistic  of  the  signal  and  the  noise  processes  affect 
the  FOM  via  /?o,  Rx  and  R2  which  are  independent  of 
N  and  K, 

The  (approximated)  probability  of  a  correct  deci¬ 
sion  for  this  problem  can  also  be  derived  using  (8), 
which  is  related  to  the  mse  by  (9). 


2.1.  Example 

We  study  time  delay  estimation  of  a  Gaussian  signal  in 
non-Gaussian  noise.  The  signal  s(i)  is  of  zero  mean  and 
variance  cr^  and  the  noise  processes  are  mixed- Gaussian 
in  the  sense  that  their  samples  are  picked  from  two  zero- 
mean  Gaussian  populations  of  variances  ai  and  (73  with 
equal  probability.  That  is: 


/n(^)  *“ 


2^ 


:  exp 


+ 


7raf 


2^ 


:  exp 


TTCT^ 


(13) 


We  consider  three  discrete  TDE  processor:  the  well- 
known  linear  correlator  [3],  the  maximum-likelihood 
M L  estimator  for  this  problem  [6] ,  and  the  minormp 
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Figure  1:  The  mean  square  error  of  the  estimate 
of  the  unknown  delay  D  =  0  for  a  Gaussian  sig¬ 
nal  in  a  zero-mean,  symmetric  mixed- Gaussian 
noise  =  10)  as  a  function  of  the  signal-to- 
noise  ratio.  Simulation  of  3  processors  are  com¬ 
pared  with  the  theoretical  results.  N  =  20,  5000 
Monte- Carlo  runs. 

processor  [4,5],  which  minimizes  the  norm-p  {p  ^  2) 
of  the  difference  between  X2{i)  and  xi{i  —  d)\  d  = 
0,1,..., /f.  Figs.  1  and  2  present  the  theoretical  per¬ 
formance  of  the  three  processors  in  comparison  with 
simulation  results.  The  theoretical  performance  is  eval¬ 
uate  using  (9),  where  rjo^rji^a  and  p  were  derived  from 
the  first  and  the  second  order  moments  of  the  statistics 
of  a  given  processor,  which  are  evaluated  directly  from 
the  simulated  data  by  averaging  over  the  correspon¬ 
dence  quantities.  The  performance  of  the  simulations 
is  evaluated  by  averaging  over  the  TDE  results  of  a 
given  processor  using  5000  Monte-Carlo  runs  for  the 
case  of  TV  =  20  (Fig.  1)  and  100  runs  for  N  =  2000 
(Fig.  2).  The  performance  is  depicted  as  a  function  of 

the  signal  to  noise  ratio  {SNR):  for  noise  with 

^  ^  =  10.  We  consider  two  cases:  N  =  20  and 

N  =  2000.  In  both  cases,  K  =  16.  The  two  cases  rep¬ 
resent  the  extreme  cases  where  the  threshold  is  places 
in  high/low  SNR. 

The  simulation  results  show  very  good  agreement 
with  the  theoretical  performance.  Note,  however,  that 
since  they  theoretical  FOM  is  based  on  the  central 
limit  theorem,  it  describes  better  the  performance  for 
N  =  2000  than  for  N  —  20. 


^  Note  that  the  linear  correlator  is  also  a  minorm2  processor. 


Figure  2:  The  mean  square  error  of  the  estimate 
of  the  unknown  delay  D  0  for  a  Gaussian  sig¬ 
nal  in  a  zero- mean,  symmetric  mixed- Gaussian 
noise  (6  10)  as  a  function  of  the  signal-to- 

noise  ratio.  Simulation  of  3  processors  are  com¬ 
pared  with  the  theoretical  results.  N  =  2000, 
100  Monte-Carlo  runs. 

3.  DISCUSSION 

Although  the  results  of  Section  2.1  are  promising,  the 
basic  assumptions  seem  restrictive.  First,  the  iid  as¬ 
sumption  on  both  the  signal  and  the  noise  processes 
can  rarely  be  justified  when  practical  applications  are 
considered.  Second,  we  assume  that  the  unknown  delay 
is  confined  to  multiples  of  the  sampling  integral.  The 
two,  assumptions  are  related.  To  study  the  effect  of  cor¬ 
relation  in  the  noise  and/or  the  signal  processes  on  the 
results,  we  have  run  simulations  of  estimators  designed 
under  the  iid  assumption  when  the  actual  distributions 
deviate  from  the  iid  pattern.  Mixed  Gaussian  correlat¬ 
ed  samples  were  generated  by  the  following  procedure: 
Each  of  3  independent  sets  of  iid  Gaussian  samples  was 
passed  through  a  low-pass  filter  of  the  form. 

^  ,14) 

The  constant  p  controls  the  bandwidth  of  the  filter. 
If  the  resulting  sequences  are  designated  as  ft  i,  /i2  and 
/?-3,  the  desired  mixed  Gaussian  correlated  sequence  Ui 
is  given  by 

Ui  =  hii{yi  -  0.5)  +  h2iiyi  +  0.5).  (15) 

Here  yi  is  generated  by  hard-limiting  hzi  at  levels 


22 


of  +0.5  and  -0.5.  Thus 


_  f  0.5  hsi  >  0 
-0.5  hsi  <  0 


(16) 


To  make  the  effect  of  correlation  explicit,  Figs.  3-4 
show  curves  for  the  linear  correlator  and  for  the  max¬ 
imum  likelihood  estimator  working  with:  a)  iid  signal 
and  noise  b)  correlated  signal  and  noise,  p  =  0.8,  c) 
correlated  signal  and  noise,  p  =  0.9.  Simulations  were 


Figure  3:  Simulation  results  of  the  linear  corre¬ 
lator:  iid  signal  and  noise  (solid  line),  iid  sig¬ 
nal  and  correlated  noise  (dash  line),  iid  noise 
and  correlated  signal  (dot),  correlated  signal 
and  noise  (p  =  0.8)  (dash-dot)  and  more  cor¬ 
related  signal  and  noise  (p  =  0.9)  (solid  +).  The 
mean  square  error  of  the  time  delay  estimates  is 
depicted  as  a  function  of  the  signal- to- noise  ra¬ 
tio.  Number  of  data  points:  N  =  64,  number  of 
Monte-Carlo  runs:  1000,  true  delay:  D  =  N/2^ 
Gaussian  signal  and  Mixed-Gaussian  noise  with 
S  =  10. 

also  carried  out  for  iid  signal/correlated  noise  and  cor¬ 
related  signal/n‘d  noise.  The  results  of  the  various  ex¬ 
periments  show  that  the  performance  with  non-n'd  sig¬ 
nal  and/or  noise  processes  shows  degradation  relative 
to  the  iid  case,  which  is  proportional  to  the  amount 
of  correlation  in  the  noise.  Since  the  proposed  FOM 
describes  the  performance  for  the  iid  case,  it  can  serve 
as  a  lower  bound  on  the  performance  for  any,  non-n'd, 
case.  Note,  that  if  the  actual  delay  is  not  an  integral 
multiply  of  the  sampling  time,  the  estimation  error  un¬ 
der  a  correct  decision  is  not  zero,  but  a  constant  pro¬ 
portional  to  the  fraction  of  the  residual  delay,  so  (3)  is 
no  longer  correct.  However,  for  comparing  the  perfor¬ 
mance  of  two  estimators  it  is  not  important  whether 


Figure  4:  As  Fig.  3,  for  the  ML  processor. 


the  true  delay  is  a  integral  number  of  the  sampling  pe¬ 
riod  or  not. 
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Abstract 

Time-varying  third-order  comulant  spectra  for 
analyzing  phonocardiographic  signals  has  been 
proposed  as  an  effective  tool  to  detect  and  quantify  the 
temporal  quadratic  nonlinear  interactions.  The 
cumulant-based  Wigner  bispectra  (CWB)  is  applied  to 
investigate  the  nonstationarity  and  non-Gaussianility 
of  both  actual  normal  and  clinical  phonoeardiogram. 
Significant  time-varying  bispectral  structure  are  found 
and  discussed.  It  is  expected  to  use  the  Wigner 
bispectra  in  the  understanding  of  the  heart  sound 
mechanism  and  the  improvement  of  the  assistant 
diagnosis  of  some  heart  diseases. 

1.  Introduction 

The  diagnostic  use  of  heart  sound  has  a  long  history 
in  cardiology.  However,  it  still  dose  not  enable  the 
analyst  to  obtain  both  qualitative  and  quantitative 
characteristics  of  the  phonocardiographic  signals. 
People  still  have  an  insufficient  understanding  of 
heartbeat  soxmd  mechanisms  and  the  inherent 
complexity  of  heart  sounds.  Abnormal  phonoeardiogram 
may  contain,  in  addition  to  the  first  and  second  soimds, 
murmurs  and  aberrations  caused  by  different 
pathological  conditions  of  the  cardiovascular  system  [1]. 
A  physician's  capability  to  diagnose  the  heart  sounds  is 
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quite  limited  because  the  human  ear  is  poorly  suited  for 
the  cardiac  auscultation. 

With  the  rapid  development  of  electronic 
instrumentation  and  the  understanding  of  the  genesis  of 
heart  sounds,  many  characteristics  and  other  features  of 
PCG  signals  can  be  analyzed  more  accurately  by 
emptying  digital  signal  processing  techniques  [2]  [3].  It 
has  been  shown  that  the  measurement  and  processing  of 
heart  sound  signals  are  clinically  significant.  In  the  past 
20  years,  a  variety  of  methods  are  used  to  analyzed 
the  PCG  signals,  such  as  the  sound  spectrograph, 
wavelet  detection,  power  spectral  estimation,  linear 
model-based  techniques  and  so  on  [4]  [5]  [6].  However, 
many  approaches  are  based  on  the  analysis  of  power 
spectrum,  or  second-order  statistics,  of  PCG  signals.  To 
extract  the  useful  features  and  more  information 
involved  in  heart  soimds,  a  newly  emerging  technique, 
higher-order  statistical  analysis,  is  proposed  in  this 
paper  to  provide  new  insights  into  the  nature  of  the 
phonoeardiogram.  Bispectral  analysis  has  been 
developed  in  response  to  a  need  to  examine  square 
nonlinear  interaction  relationships  in  the  heart  sound 
signals.  On  the  other  hand,  since  heart  sound  exhibit 
marked  changes  with  time  and  frequencies,  time- 
varying  techniques  are  necessary  to  be  considered  for 
analyzing  the  PCG.  The  earlier  time-frequency  analysis 
for  phonocardiographic  signals  was  based  on  the  soxmd 
spectrograph.  Recently,  several  kinds  of  windowed 
Wigner- Ville  distributions  and  wavelet  transformations 


0-8186-8005-9/97  $10.00  ©  1997  IEEE 


24 


were  applied  to  detect  the  temporal  power  spectra  with 
high  resolution  [7]  [8].  However,  second  order  time- 
jBrequency  representations  are  not  sufficient  in  the  study 
of  non-Gaussianility  of  the  nonstationary  PCG  signals. 
To  investigate  the  nonlinear  nature  of  different  non¬ 
stationary  and  non-Gaussian  PCG  signals,  as  was  done 
for  the  stationary  case,  it  is  important  to  extend  the 
quadratic  time-frequency  distribution  to  the  higher  order 
time-frequency  representation  [9]  [10].  For  this  purpose, 
the  time-varying  higher-dimensional  spectra  in  this 
paper  is  proposed  for  the  analysis  and  diagnosis  of 
phonocardiogram.  Several  kinds  of  real  PCG  data 
are  analyzed  based  on  third-order  cumulant-based 
Wigner  higher-order  spectra.  Significant  differences  of 
the  transition  patterns  are  analyzed  between  the  normal 
and  pathological  phonocardiographic  signals  in  the 
time-varying  bifrequency  domain.  The  aim  is  to  get  a 
new  insight  into  the  detection  of  square  nonlinear 
quadratic  interactions  of  the  nonstationary 
phonocardiographic  signals  in  terms  of  time  varying 
third-order  cumulant  spectra. 

2.  Wigner  higher-order  spectra 

Let  x(t)  be  a  complex  deterministic  signal.  The 
kth-dimensional  local  autocorrelation  of  x(/)  is 
defined  as  [11]  [12] 


(3) 

=  FT,{R,it,T„T^,-,r,)) 

The  special  case  of  Wigner  higher-order  moment 
spectra  for  k=2  is  called  as  Wigner  third-order  moment 
spectra  or  bispectrum  (WB)  which  is  widely  used  in 
many  practical  applications.  Therefore,  we  focus  on  the 
analysis  of  the  third-order  cumulant  spectra.  The 
Wigner  bispectra  of  the  signal  x(f)  is  expressed  as 

=  J  1  ■ 

Jfi  Jrj 

■x{t  +  2tJ3-  r^f3)x(t  +  2tJ3-tJ3)-  (4) 

The  definition  of  the  Wigner  higher-order  moment 
spectra  is  only  suitable  for  deterministic  signals.  For 
many  kinds  of  biomedical  signals  and  other  kinds  of 
engineering  problems,  they  should  be  considered  as 
non-stationary  random  process.  It  is  assumed  that 
{ x(/)  }  is  a  non-stationary  complex  random  process 
with  zero-mean  value.  The  time-varying  third-order 
cumulant  is  denoted  by 


C^^(x)  =  cum(t,T^,T^) 


(5) 


/=i 


(1) 


where  ^  is  an  arbitrary  time  delay.  It  is  given  by  the 
following  e^resion  [13] 

<=i 

The  kth  order  Wigner  higher-order  moment  spectra 
(WHOS)  is  defined  as  die  k-dimensional  Fourier 
transformation  of  the  local  autocorrelation 


Then  the  cumulant-based  Wigner  bispectrum  (  CWB  ) 
is  defined  as 


■exp[-j27v(fj^  +f^T^)]dT^dT^ 


(6) 


Two  main  properties  of  CWB  are  summaried  as 
following 

(1)  If  complex  random  processes  {  x(/)  }  and  {  y(t )  } 
are  statistically  independent  and  z(t)  =  x(t)  +y(t)  , 
then  we  have 
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CWBXt,r„T,)^ 


(7) 


=  CWBXi,T^,T^)  +  CWBX(,r„rX 

(2)  If  complex  random  process  {x(/)}  is  jointly 
Gaussian  with  zero  mean  and  independent  and  equally 
correlated  real  and  imaginary  parts,  then 

CWB^(t,r,,T,)=0  (8) 

Therefore,  CWB  can  preserve  many  important 
properties  of  third-order  statistics  of  a  stationary  signal, 
for  example,  elimination  of  additive  Gaussian  noise 
with  unknown  covariance,  and  at  the  same  time  are  able 
to  characterise  the  time  variations  of  frequency  contents 
of  the  signals. 

3.  Experimental  results  and  discussion 

A  personal  computer-based  system  for  heart  sound 
measurement  and  data  acquisition  is  implemented.  ECG 
lead  is  considered  to  collect  ECG  as  a  timing  signal. 
Normal  and  clinical  PCG  signal  were  digitally  recorded 
in  Beijing  Fuwai  Hospital.  In  order  to  test  the  nonlinear 
changes  in  high  frequency,  the  sampling  rate  used  is 
1250  and  2500  samples/s.  The  length  of  each  record  of 
PCG  signal  is  20  second.  To  investigate  the  time- 
varying  quadratic  phase  coupling  of  the  PCG  signals, 
the  cumulant-based  Wigner  bispectral  (CWB)  technique 
is  employed  to  analyze  the  time-bifrequency  distribution 
of  two  cases  of  phonocardiographic  signals,  normal  and 
abnormal  with  murmurs.  The  cumulant-based  Wigner 
bispectra  of  the  heart  sound  signal  is  estimated  by  using 
FFT  technique  with  2048  points  of  heart  sound  in 
discrete  time  and  bifrequency  form. 

The  magnitude  of  time-varying  bispectrum  for 
both  actual  normal  and  pathological  PCG 
signals  at  (a)  /7j=1024,  (b)  =2048  and  (c)  Ai3=3072 

are  shown  in  Fig.l  and  Fig.2,  respectively.  It  can  be 
seen  that  the  quadratic  phase  coupling  is  significantly 
changing  with  time  in  time-bifrequency  domain.  The 
time-varying  bispectral  structure  of  two  kinds  of 
phonocardiographic  signals  are  clearly  different.  The 
frequencies  of  the  PCG  signals  that  occur  quare  phase 


(c) 

Fig.  1  The  time-varying  third-order  cumulant  spectra 
of  the  normal  phonocardiogram 
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35  0 


(c) 

Fig.  2  The  time-varying  third-order  cumulant  spectra 
of  the  pathological  phonocardiogram  with  murmurs 


coupling  in  bifrequency  domain  are  time-varying. 
Some  bispectral  peaks  are  observed  for  the  normal  heart 
soimd  signal  in  the  relatively  low  bifrequency  band  with 
the  frequencies  from  100  Hz  to  300  Hz.  In  the 
pathological  PCG  signal  with  murmurs  case, 
several  bispectral  peak  also  exist,  but  they  slowly 
change  with  time  and  occur  in  higher  biffequency 
domain  with  the  frequencies  changing  from  120  Hz  to 
500  Hz.  This  result  is  expected  since  the  frequency 
contents  of  murmurs  are  higher  than  the  first  and  second 
heart  sound.  It  is  indicated  that  the  pathological  cardiac 
vibration  characteristics  has  changed.  In  practical 
application,  heart  sounds  collected  are  unayoided  to  be 
corrupted  by  additive  Gaussian  noise.  The  affect  of 
backgroud  Gaussian  noise  to  the  phonocardiographic 
signals  was  analyzed  with  several  SNR,  defined  as 
lOloga  .  We  observe  that  the  CWB  technique 
has  high  immunity  to  additive  Gaussian  noise  with 
arbitrary  covariance,  while  the  power  specfral  analysis 
or  time-frequency  representation  are  limited  for 
elimination  of  background  Gaussian  noise. 

4,  Conclusion  remarks 

We  present  a  new  method,  higher  order  time- 
frequency  distribution,  for  analyzing  the  nonstationity 
and  the  non-Gaussianility  of  phonocardiographic  signals. 
In  this  paper,  it  can  be  seen  that  the  cumulant-based 
Wigner  higher-order  spectra  is  applicable  for  the 
analysis  of  practical  PCG  signals.  Both  actual  normal 
and  pathological  heart  sounds  are  digitally  collected  and 
processed  by  employing  the  advanced  signal  processing 
technique:  time-vaiying  third-order  cumulant  spectra. 
The  cumulant-based  Wigner  bispectra  (CWB)  is  found 
to  be  a  successful  tool  for  the  analysis  of  heart  sound 
due  to  the  fact  that  the  PCG  signals  are  characterized  by 
changes  in  bifrequency  domain  as  time  progresses.  We 
foimd  that  the  significant  different  patterns  of  time- 
varying  bispectral  structure  exist  between  the  PCG 
signals  of  normal  persons  and  the  patients  in  terms  of 
CWB  method.  The  CWB  reveals  more  details  about  the 
temporal  square  nonlinear  information  of  the  heart 
sounds.  New  nonstationary  and  non-Gaussian  signal 
processing  techniques  provide  a  powerful  tool  for  PCG 
signals,  particularly  in  regard  to  the  nonlinear 
relationship  of  the  clinical  heart  sounds.  This  may 
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be  applied  in  the  improvement  of  the  diagnostic 
techniques  of  some  heart  diseases  in  cost-effective 
approaches  and  extended  to  the  analysis  of  other 
biomedical  signals. 
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Abstract 

This  paper  investigates  the  application  of  non- 
Gaussian  AR  model  and  the  parametric  bispectral 
estimation  in  analyzing  normal  and  pathological  heart 
sound  signals.  The  non-Gaussian  AR  model  of  PCG 
signals  (phonocardiogram)  is  used  to  detect  quadratic 
nonlinear  interactions  and  to  classify  the  two  patterns 
of  phonocardiograms  in  terms  of  the  parametric 
bispectral  estimate.  The  bispectral  cross-correlation 
is  proposed  to  the  order  determination  of  the  model 
Several  real  PCG  data  are  implemented  to  show  that 
the  quadratic  nonlinearity  exist  in  both  normal  and 
clinical  heart  sounds.  It  was  found  that  parametric 
bispectral  techniques  are  effective  and  useful  tools  in 
analyzing  PCG  and  other  biomedical  signals,  such  as 
EMG,  ECG  and  EEG. 


1.  Introduction 

Cardiac  auscultation  has  become  a  basic  clinical  tool 
to  analyze  heart  sounds  for  a  long  time.  However, 
quantitative  phonocardiogram  (PCG)  and  advanced 
processing  technique  have  lagged  behind.  The 
understanding  of  heart  soimd  mechanisms  and  the 
inherent  complexity  of  PCG  signals  is  still  limited.  With 
the  development  of  digital  signal  processing  techmques, 
many  research  results  have  shown  that  the  measurement 
and  processing  of  PCG  signals  are  clinically  significant 
because  the  signals  involve  plenty  of  useful  information 
relating  to  the  different  states  of  the  heart.  A  variety  of 


methods,  such  as  the  sound  spectrograph,  envelope 
distribution,  FFT  technique,  time-frequency 
representation  and  model-based  methods,  are  widely 
used  to  detect  the  information  of  PCG  signals  for 
clinical  purpose.  Both  time  and  fi'equency  domain 
analysis  are  developed  for  the  PCG  signals  analysis  [1] 
[2]  [3].  Parametric  methods,  such  as  Burg  and  Marple 
algorithm,  are  widely  applied  for  the  analysis  and 
classification  of  PCG  signals  since  the  parametric 
methods  give  a  better  estimation  of  spectral  features  of 
the  signals  [4]  [5].  However,  many  methods  based  on 
power  spectrum  or  correlation  for  PCG  signals 
processing  start  with  the  assumption  of  linearity, 
Gaussianility  and  minimum  phase  systems  [6].  In  other 
words,  all  these  methods  generally  depend  on  the  second 
order  statistics.  PCG  signals  often  do  not  comply  witii 
these  assumption  in  practical  applications.  To 
understand  the  exact  feature  and  extract  more 
information  involved  in  phonocardiographic  signals,  this 
contribution  proposes  the  higher-order  spectra  (HOS) 
for  analysis  and  classification  of  heart  sounds.  HOS  can 
reveal  more  information  than  power  spectrum  can. 
Cumulant  or  polyspectra,  provides  both  amplitude  and 
phase  mformation  of  the  signals  [7]  [8].  Furthermore, 
HOS  provides  a  measurement  of  Gaussianility  since 
both  cumulant  and  polyspectra  are  identically  zero  for 
any  stationary  Gaussian  processes  [9].  Consequently, 
HOS-based  analysis  significantly  increases  the  SNR 
when  the  signals  are  contaminated  by  additive  Gaussian 
noise  with  unknown  covariance. 

The  pmpose  of  this  contribution  are  three  folds:  to 
discuss  non-Gaussian  AR  model  of  PCG  signals,  to 
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detemime  the  order  of  non-Gaussian  AR  model  for  PCG 
signals,  and  to  analyze  the  classification  of  two  patterns 
of  phonocardiogram  by  bispectmm. 

2.  Method 

2.1  Non-Gaussian  AR  model  of  PCG  signals 

It  is  assumed  that  the  phonocardiographic  signals  are 
non-Gaussian  and  third-order  stationary.  Let  a  PCG 
sequence  {  ^(A:)  }  be  modelled  by  a  non-Gaussian  AR 
model  as  follows  [10]: 

p 

s(n)-\-'Yja^s{n-k)  =  w{n)  (1) 

k^\ 

where  w(w)  is  a  white  non-Gaussian  i.i.d.  process  with 
zero  mean  and  third-order  stationary.  E\w^  (w)]  =  Q 
and  E[w^{n)  -  P  ^0.  The  system  given  in  (1)  is 
assumed  to  be  stable  and  s{n)  is  statistically 
independent  of  w(n)  for  A;  <  « .  It  is  followed  from  (1) 
that  [11] 

C,X-m-n)  =  p-5(m,n)- 

(2) 

m,n>0 

k^l 

where  C3^(/M,«)  is  the  third-order  cumulant  of 
sequences  s(k ) .  The  parameters  {  }  can  be  decided 

by  solving  expresion  (2)  along  the  line 

m  =  n=0,l,--,P- 

2.2  The  order  determination  of  non-Gaussian 
AR  model 

Like  most  cases  for  order  selection  in  parametric 
spectral  estimation,  the  model  order  in  (2)  must  be 
considered.  The  criteria  like  FPE,  AIC  or  CAT  for  order 
determination  is  not  valid  for  the  model  given  in 
expresion  (1)  since  all  of  these  criterias  are  developed 
imder  the  assumption  of  second-order  statistics  [12]. 
Some  methods  for  order  decision  of  ARMA  models 
were  reported,  such  as  the  SVD  approach  [13]  and 
information  theory  criteria  [14].  Third-order  statistics 


are  found  to  be  approapriate  for  order  selection  of  non- 
Gaussian  ARMA  model  in  which  the  non-Gaussian 
input  is  contaminated  by  additive  Gaussian  noise  with 
unknown  covariance.  For  the  model  in  (2),  an  approach 
for  AR  model  •  order  selection  is  proposed  to  the 
parametric  bispectral  estimation  of  the  PCG  signals 
[15].  The  bispectral  cross  correlation  (BCC)  is  defined 
as : 

MC  =^'£\DX(0^,co,)\-\B^((3)^,co,)\  (3) 

Cdj 

where  DX(0^,(oP)  represents  the  bispectral  estimate 
by  using  the  conventional  method,  and  is 

the  parametric  bispectral  estimate  with  order  p.  Note 
that  the  energy  content  of  each  AR  estimation  should  be. 
normalized  when  using  (3).  The  BCC  changes  with  the 
AR  model  order  p.  The  optimal  order  is  decided  when 
the  maximum  value  of  BCC  occurs. 

2.3  Pattern  classification  based  on  the  AR 
model 

Since  HOS  retain  both  amplitude  and  phase 
information  of  the  signal,  the  features  of  the  HOS  can 
be  extracted  for  the  purpose  of  clinical  pattern 
classification.  Moreover,  features  derived  from  HOS  of 
the  signals  have  high  immxmity  to  additive  Gaussian 
noise.  The  parameters  of  non-Gaussian  AR  model  are 
proposed  to  form  the  features  vector  for  the  pattern 
classification  of  PCG  signals.  The  classification  of 
normal  and  patient  phonocardiogram  is  analyzed.  A 
scheme  for  obtaining  minimal  set  of  parameters  {  «*  } 
from  equation  (2)  is  selected  as  choosing  features 
vectors.  Both  linear  and  nonlinear  classifier  for  the 
different  PCG  signals  are  performed  and  tested.  A  groiq) 
of  normal  and  clinical  phonocardiographic  signals  are 
employed  for  the  two  classes  of  pattern  classification. 
The  results  with  different  order  of  non-Gaussian  AR 
model  are  provided  to  demonstrated  that  the  perfomance 
of  patterns  classification  in  terms  of  third-order 
statistics  is  effective  and  applicable.  The  perfomance  of 
classification  with  different  noisy  background  are 
investigated  to  exhibited  the  high  SNR  performance 
when  parametric  bispectral  technique  is  applied  to  the 
pattern  classification  of  the  PCG  signals. 

3.  Results  and  discussion 
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A  computer-based  system  for  PCG  signals  detection 
and  data  acquisition  is  carried  out,  in  which  the  ECG 
was  collected  as  a  timing  signal.  10  normal  and  12 
patient  PCG  signals  were  digitally  recorded  from  Fuwai 
Hospital  in  Beijing  and  stored  as  data  files  for 
postprocessing.  The  analog  PCG  signals  were  converted 
to  digital  format  through  an  A/D  converter  at  a  sampling 
rate  of  1250  Hz  and  2500  Hz.  In  order  to  understand  the 
bispectral  structure  of  PCG  signals,  both  FFT  method 
and  parametric  method  are  used  for  bispectral 
estimation  of  PCG.  The  amplitude  bispectra  in  terms  of 
conventional  method  and  non-Gaussian  AR  meodel  are 
shown  in  Fig.  1  (a)  and  (b),  respectively. 


X  10*^ 


X  10* 


Fig.  1  The  amplitude  bispectra  of  normal  PCG  by 
FFT  and  parametric  model 


(a) 


Fig.2  The  bispectra  of  clinical  PCG  with  MI  and  MS 

By  comparing  the  bispectral  structure,  it  is  found 
that  parametric  approach  offers  much  better  resolution 
of  bispectral  estimation.  Fig.2  (a)  and  (b)  give  the 
parametric  bispectral  estimation  obtained  from  two 
records  of  clinical  PCG  signals  with  heart  disease  of  MI 
and  MS.  The  results  clearly  indicate  that  different 
bispectral  structure  exist  between  the  normal  and  some 
abnormal  phonocardiograms.  Consequently,  PCG 
signals  should  be  treated  as  a  non-Gaussian  process. 
For  the  normal  phonocardiogram,  there  exist  four 
bispectral  peaks  with  the  frequency  from  1 10  Hz  to  180 
Hz  in  bifrequency  domain.  For  the  pathological 
phonocardiogram,  four  bispectral  peaks  also  exist  in 
the  bifrequency  domain,  but  the  bispectral  structure  is 
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quite  different  from  that  of  normal  PCG.  The  bispectral 
peaks  clearly  move  to  high  frequency  area  in 
bifrequency  domain,  with  the  frequency  ranging  from 
180  Hz  to  500  Hz  since  the  clinical  reasons  and  the 
murmurs.  The  bifrequencies  corresponding  to  the 
significant  levels  of  detected  square  nonlinearity  are 
summarized  in  Table  1. 


Table  1.  Frequencies  occuring  bispecral  peaks  for 
three  kinds  of  PCG  signals  in  bifrequency  domain. 


B,  (/../,) 

B, (/„/,) 

B,  (/„/,) 

B.  (/„/.) 

normal 

120,  120 

180, 120 

280, 120 

180,  100 

clinical  1 

280,  280 

500, 280 

280, 220 

280,  180 

clinical  2 

270,  270 

270, 160 

470,  340 

0,  270 

The  background  noise  was  tested  for  PCG  bispectral 
estimation  with  different  SNR.  As  shown  in  Fig.3  (a), 
one  record  of  typical  normal  phonocardiographic  signals 
contaminated  with  Gaussian  white  noise  is  illustrated. 
Fig.3  (b)  shows  the  amplitude  of  parametric  bispectral 
estimate.  It  can  be  seen  that  the  bispectral  analysis  has 
high  immunity  to  the  backgroimd  noise  by  comparing 
Fig.l  with  Fig.3. 

The  bispectral  cross  correlation  (BCC)  is  employed 
to  the  optimal  order  determination  for  the  normal  PCG 
non-Gaussian  AR  model.  The  bispectral  estimation  with 
different  model  orders  p  was  implemented  to  estimate 
the  BCC  values.  We  observe  that  the  maximum  BCC 
value  occurs  when  order  p  is  equal  to  13.  Moreover,  it 
can  be  seen  that  the  BCC  values  change  randomly  as 
model  order  increases.  We  found  that  the  optimal  orders 
for  non-Gaussian  AR  model  is  between  10  and  15  for  a 
normal  PCG  sequence. 

The  model  parameters  stand  for  the  third-order 
statistics  of  the  signals.  They  are  selected  to  form  the 
features  vectors  for  two  classes  of  pattern  classification 
of  phonocardiogram,  normal  and  clinical 
phonocardiograph  with  murmurs.  Nonlinear  method  is 
applied  to  differiate  the  normal  and  the  pathological 
phonocardiogram.  These  particular  feature  vectors, 
standing  for  the  third-order  statistics  of  PCG  data,  have 
provided  a  good  classification  accuracy  even  when  the 
parameters  with  orders  p  are  less  than  optimal  order  13. 
A  group  of  real  normal  and  pathological  PCG  records 
are  employed  for  two  classes  of  pattern  classification 
based  on  parametric  bispectral  estimate.  Some 
experiment  of  classification  of  noisy  patterns  are  also 


investigated  to  demonstrate  the  high  SNR 
performance  when  third-order  cumulant  is  used  to  the 
pattern  classification  of  PCG. 


TIME 

(a) 


X  10'* 


(b) 

Fig.3  The  PCG  signal  contaminated  with  additive 
Gaussian  noise  and  the  bispectrum 

4.  Conclusion 

The  higher-order  spectra  was  used  in  this  paper  to 
investigate  the  heart  sound  signals.  Non-Gaussian  AR 
model  was  fitted  for  the  bispectral  estimation  and 
pattern  classification  of  phonocardiogram.  Parametric 
bispectral  estimate  of  PCG  w  ere  employed  to  detect  and 
quantify  the  presence  of  quadratic  phase  coupling 
occurring  in  the  different  clinical  phonocardiograms. 
Furthermore,  The  parameters  of  the  non-Gaussian  AR 
model  were  applied  as  the  features  for  two  classes  of 
pattern  clssification.  Several  real  PCG  data  were 
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analyzed  in  terms  of  bispectrum.  The  results 
demonstrate  that  a  PCG  should  be  treated  as  a  non- 
Gaussian  process.  Different  bispectral  structure  exist  in 
both  normal  and  pathological  heart  sounds.  The 
information  of  the  bispectrum  was  used  as  the  main 
features  for  the  two  pattern  classification  of  the  PCG 
signals.  It  was  found  that  parametric  bispectral 
techniques  are  useful  tools  in  analyzing  die  quadratic 
nonlinear  interactions  of  biomedical  signals,  such  as 
PCG,  ECG  and  EEG.  Polyspectra  may  be  an  effective 
and  useful  tool  for  understanding  the  basic  heart  sound 
mechanism  and  improving  diagnostic  sensitivity  via 
heart  sounds. 
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ABSTRACT 

The  bispectrum  of  a  direct  sequence  spread  spectrum 
communication  signal  contaminated  with  coupled  mul¬ 
titone  jamming  provides  significant  information  about 
the  jammer  frequencies  and  thus  permits  its  excision 
using  a  bank  of  notch  filters.  Jammers  that  can  be 
modeled  as  an  autoregressive  process  can  also  be  ex¬ 
amined  in  the  bispectral  domain  and  mitigated  using  a 
linear  FIR  filter  with  temporally  changing  coefficients. 
The  results  indicate  that  utilizing  bispectral  analysis  in 
DSS  spread  spectrum  communications  is  an  attractive 
alternative  to  conventional  excision  techniques  and  in 
some  scenarios  the  only  excision  option  available. 

Keywords:  Excision,  bispectrum,  spread  spectrum. 

1.  INTRODUCTION 

Interference  mitigation  in  military  communication  sys¬ 
tems  can  be  accomplished  using  array  processing  in 
which  a  jammer  or  an  undesired  signal  is  spatially  nulled 
in  favor  of  the  desired  signal.  Alternatively  the  desired 
narrowband  signal  may  be  spread  spectrally  prior  to 
transmission  according  to  a  preset  psuedo  noise  code 
and  then  despread  upon  reception.  Simultaneously  any 
channel  interference  is  going  to  be  spread  at  the  receiver 
to  a  much  lower  signal  level  and  then  excised  using  the 
appropriate  filtering  technique.  Spreading  the  signal 
prior  to  transmission  is  often  accomplished  by  modu¬ 
lating  the  desired  narrowband  signal  using  a  spreading 
sequence  such  as  a  psuedo  noise  code  word  sequence,  or 
by  modulating  the  desired  signal  in  a  psuedo  random 
manner  (frequency  hopping). 

Spread  spectrum  concept  also  permits  code  divi¬ 
sion  multiple  access  communications  (CDMA)  in  which 
many  signals  share  the  same  time  and  frequency  range 
but  are  multiplexed  according  to  a  combination  of  psuedo 
random  sequences.  CDMA  has  many  civilian  and  mil¬ 
itary  applications  particularly  in  wireless  cellular  tech¬ 
nology.  Clearly  excision  of  deliberate  jamming  as  well 
as  multipath  interference  is  a  problem  that  requires 
considerable  attention.  Excision  can  be  accomplished 


using  a  filter  whose  coefficients  can  be  estimated  in  a 
minimum  mean  square  sense  assuming  prior  knowledge 
of  signal  statistics,  or  using  excision  in  the  transform 
domain,  the  time-frequency  domain,  or  the  time-scale 
domain  [3,  6].  Interference  excision  can  also  be  accom¬ 
plished  using  adaptive  processing  and  learning  tech¬ 
niques,  closed  loop  estimation  and  locking  methods, 
or  using  open  loop  filtering.  The  appropriate  excision 
system  depends  on  the  nature  of  the  jammer  and  its 
statistics  and  stationarity.  The  temporal  characteris¬ 
tics  of  the  interference  are  also  important  factors  in  de¬ 
termining  the  proper  excision  operation.  In  this  paper 
we  consider  two  specific  type  of  jammers  with  inherent 
temporal  changes,  namely  multitone  jammers  with  im¬ 
plicit  coupling,  and  jammers  that  can  be  modeled  as 
the  output  of  a  non-Gaussian  autoregressive  system. 

2.  MULTITONE  COUPLED  JAMMING 

Direct  sequence  spread-spectrum  (DSS)  communica¬ 
tion  systems  are  vulnerable  to  line-of-sight  as  well  as 
multipath  components  of  an  interference  source,  in  ad¬ 
dition  to  the  multipath  components  of  the  desired  DSS 
communication  signal.  While  the  bispectrum  of  the  de¬ 
sired  communications  signal  is  theoretically  zero,  and 
that  of  the  correlated  components  of  the  multipath  in¬ 
terference  is  not,  it  is  argued  that  the  bispectral  infor¬ 
mation  of  a  received  communications  signal  can  be  used 
to  identify  the  instantaneous  frequencies  of  the  corre¬ 
lated  jammer  components  and  can  thus  be  utilized  in 
placing  a  more  accurate  and  effective  spectral  excision 
filter. 

In  this  paper  we  extract  bispectral  information  for 
a  class  of  jammers  with  multipath  components.  The 
higher  order  spectrum  of  the  desired  signal  is  assumed 
known  a  priori.  A  bi-frequency  excision  filter  is  then 
applied  to  the  spread-spectrum  signal  to  remove  the 
correlated  spectral  components  of  the  interference.  Time 
varying  excision  filters  are  of  course  needed  to  track  the 
instantaneous  frequencies  of  the  correlated  interference 
sources.  The  bispectrum  is  thus  computed  for  rela- 
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tively  short  data  segments  to  maintain  the  stationarity 
of  the  jammer  but  long  enough  segments  to  obtain  low 
variance  bispectral  estimates. 

The  desired  direct  sequence  spread  spectrum  signal 
associated  with  the  n  —  th  information  bit  {6^}  of  the 
m  —  th  user  is 

L 

^nm(0  ~  ^  k'^c)  (1) 

A;  =  l 

where  L  denote  the  number  of  PN  chips  per  informa¬ 
tion  bitj  and  Pnm(^)  represents  the  k  —  th  bit  of  the 
m  —  th  PN  code  and  q{t)  is  the  chip  pulse  and  Tc  is 
the  chip  duration.  The  desired  DSS  transmitted  signal 
associated  with  the  m  —  ih  user  is  thus 

Sm{t)  =  ^  InKm(t  -  nTb)  (2) 

n 

where  Tf,  =  Ltc  represent  the  bit  duration  and  In  is  the 
n  —  ih  information  bit.  A  code  division  multiple  access 
signal  is  then  expressed  as 

^(0  ~  (3) 

m 

The  transmitted  signal  is  corrupted  with  additive  jam¬ 
ming  and  additive  white  noise  and  can  be  expressed 
as 

r{t)=s{t)+j{t)  +  n{t)  (4) 

where  n{t)  is  white  zero-mean  Gaussian  noise.  The 
multitone  jammer  is  modeled  as 

P  3 

j(<)  =  EE  aki  cos  {Xki  +  Oki)  (5) 

1=1  k=i 

where  A3/  =  X21  +  Ai;  which  indicates  a  coherent  cou¬ 
pling  between  the  jammers  multitone  components.  The 
amplitude  of  these  components  are  denoted  by  aki,  and 
the  angles  are  uniformly  distributed  random  phases 
over  (0,  27r)  where  =  On  H-  ^2/-  The  number  of  cou¬ 
pled  multitone  signals  is  denoted  by  P. 

The  above  jammer  model  is  used  in  the  simulation 
phase  of  this  study,  but  the  concept  of  excision  us¬ 
ing  bispectral  features  can  be  implemented  with  any 
jammer  that  exhibits  definable  bispectral  features  and 
assuming  that  the  bispectrum  of  the  desired  signal  is 
theoretically  zero. 

The  envisioned  receiver  estimates  the  bispectrum  of 
the  received  signal  which  is  the  two-dimensional  Fourier 
transform  of  the  third  order  cumulant  of  the  received 
signal  Cr{m,  n) 

Cr(m,n)  =  E{r{t)r{t m)r{t n)}  (6) 

Sr(wi,W2)  = 

m  n 


where  E{}  denotes  the  statistical  expectation.  The 
additive  white  Gaussian  noise  is  independent  of  the 
psuedo-noise  like  information  signal,  thus 

54^1,  ^2)  =  Bj  {uuu)2)  (7) 

and  the  bispectrum  of  the  received  signal  is  entirely  due 
to  the  jammer  signal  and  its  multipath  components  and 
any  other  coupled  components. 

The  bispectrum  of  the  received  signal  has  twelve 
symmetry  regions  in  the  bi-frequency  plane.  It  is  suf¬ 
ficient  to  examine  the  estimated  bispectrum  in  one  of 
these  twelve  symmetry  areas  where  a  peak  search  algo¬ 
rithm  is  implemented  to  locate  those  bispectral  com¬ 
ponents  that  are  above  a  certain  level  (the  bispectrum 
of  white  Gaussian  noise  in  our  case).  The  search  al¬ 
gorithm  identifies  peaks  within  a  window  of  8  or  12 
bispectral  pixels.  The  algorithm  covers  the  entire  bis¬ 
pectral  domain  area  defined  by  wi  -f  u;2  <  tt  which 
is  sufficient  assuming  a  properly  sampled  received  sig¬ 
nal.  The  frequency  pair  (a;i,u;2)  associated  with  each 
bispectral  peak  indicates  the  presence  of  three  jammer 
tones  at  frequencies  lji,  uj2  and  uji  u) 2  >  The  bispec¬ 
trum  may  also  identify  a  jammer  tone  who  is  not  cou¬ 
pled  with  other  tones,  such  a  bispectral  peak  appears 
on  one  of  the  axes  in  the  bifrequency  plane.  In  addi¬ 
tion  to  the  bispectrum  the  receiver  may  also  examine 
the  power  spectrum  of  the  incoming  signal  as  well  as 
its  time- frequency  or  time-scale  signatures.  Thus  the 
proposed  excision  system  could  operate  either  indepen¬ 
dently  or  in  conjunction  with  spectral  based  excision 
system.  The  bispectrum  based  excision  system  clearly 
identifies  all  jammer  components  that  may  have  been 
missed  using  spectral  analysis  either  because  of  their 
low  level  or  because  of  the  presence  of  a  strong  de¬ 
gree  of  coupling  between  jammer  components.  A  non- 
parametric  estimation  of  the  bispectrum  results  in  a 
frequency  resolution  that  is  equivalent  to  that  of  the 
spectrum  and  is  limited  by  the  entropy  of  the  received 
signal.  A  nonpar ametric  bispectral  estimation  tech¬ 
nique  could  improve  the  quality  of  the  estimates  of  the 
jammer  frequencies.  The  nonparametric  estimation  of 
the  jammer  frequencies  from  their  bispectral  signatures 
is  not  included  in  this  study. 

Excision  is  accomplished  using  a  cascade  of  notch 
filters  or  a  multinotch  filter  tuned  to  the  estimated  jam¬ 
mer  frequencies.  Three  types  of  filters  were  examined 
in  this  study:  FIR  linear  phase  filters 

M 

=  ^6^6-^“''  (8) 
k  =  l 

where  the  symmetric  coefficients  are  obtained  such  that 
the  filter  response  at  the  nulling  frequency  An ,  ^  )  = 
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0.  Three  or  five  tap  finite  impulse  response  filters  sim¬ 
ilar  to  those  proposed  in  [1,  6]  were  utilized  in  this 
study  to  perform  the  appropriate  excision.  The  results 
were  inferior  simply  because  such  wide  notch  filters  per¬ 
form  poorly  in  a  cascade  environment.  Alternatively  a 
multinotch  FIR  filter  was  designed  and  the  coefficients 
were  estimated  in  a  least  mean  square  sense  to  fit  the 
constraints  of  the  multinotch  filter.  The  resulting  filter 
exhibited  high  ripple  error  and  did  not  produce  suffi¬ 
cient  jammer  excision  and  resulted  in  significant  signal 
distortion.  Successful  nulling  of  multitone  jamming  is 
accomplished  using  a  cascade  of  HR  notch  filters  or 
using  a  multinotch  HR  filter.  A  IIR  filter  is  given  as 

fj  (  _  e^'^‘^-2e^"cos(A„)  +  l 

V®  }  gj2w  _  2  (l  —  cos(A„)  +  (1  -  5) 

(9) 

where  B  is  the  notch  bandwidth  normalized  with  re¬ 
spect  to  the  sampling  period.  A  multinotch  filter  can 
also  be  used  for  excision  of  multitone  signals 

_ 1 _ 

H  j  ))  +  ;7(e(“+^  ))] 

(10) 

where  U{x)  =  \/{x  -  1),  and  Bn  denotes  the  notch 
filter  bandwidth  at  the  frequency  A^.  While  an  IIR 
filter  is  easily  implemented  using  today’s  high  speed 
digital  signal  processors,  phase  nonlinearity  which  leads 
to  signal  distortion  remains  a  concern.  Furthermore, 
IIR  filters  who  span  segments  of  temporally  changing 
frequency  content  are  less  desirable  in  a  highly  time- 
dependent  changing  jamming  environment. 

There  are  several  factors  that  affect  the  performance 
of  the  proposed  bispectral  based  excision  system.  First, 
the  stationarity  of  the  signal,  the  noise,  and  the  jam¬ 
mer,  and  in  particular  the  third  order  stationarity  of 
the  jammer.  While  the  assumption  of  a  stationary  sig¬ 
nal  and  a  stationary  additive  noise  is  a  valid  one,  the  as¬ 
sumption  of  a  perfect  third  order  stationary  multitone 
jammer  is  not  absolute.  The  proposed  excision  system 
mitigates  any  jammer  that  exhibits  genuine  bispectral 
features  at  frequency  pairs  in  the  bispectral  domain.  It 
is  therefore  necessary  to  assume  that  the  received  sig¬ 
nal  is  third  order  stationary  within  a  short  observation 
window  (in  the  order  of  50  or  60  information  bits)  or  in 
a  few  nano  seconds  time  period.  This  assumption  over 
such  a  short  period  of  time  is  arguably  a  valid  one.  Sec¬ 
ondly,  the  issue  of  temporal  change  in  the  frequencies 
of  the  multitone  coupled  jammer.  It  is  assumed  in  this 
study  that  the  jammer  frequencies  do  change  tempo¬ 
rally  over  very  short  periods  of  time  (in  the  order  of 
few  nano  seconds).  The  bispectral  search  algorithm  is 
expected  to  detect  any  temporal  change  in  the  tones 
of  the  jammer.  However,  within  each  observation  win¬ 


dow  which  corresponds  to  few  nano  seconds  of  digital 
information,  the  jammer  tones  are  assumed  fixed  and 
the  signal  is  block  filtered  using  the  above  excision  sys¬ 
tems.  The  bandwidth  of  the  excision  filter  is  chosen 
wide  enough  such  that  any  temporal  change  in  the  jam¬ 
mer  tones  which  is  undetected  by  the  bispectral  search 
algorithm  remains  within  the  filter  3-db  notch  region. 
The  third  factor  is  the  bandwidth  of  the  notch  filter  and 
how  it  may  change  from  one  jammer  tone  to  another. 
The  notch  filter  bandwidth  does  effect  the  performance 
of  the  proposed  excision  system.  A  choice  of  B  that  is 
consistent  with  the  search  algorithm  frequency  resolu¬ 
tion  limits  seem  to  be  the  best  compromise.  It  is  desir¬ 
able,  in  the  presence  of  coupling,  that  the  notch  at  the 
third  jammer  tone  be  twice  as  wide  as  that  of  the  fre¬ 
quency  resolution  simply  because  it  is  the  sum  of  two 
estimated  frequencies  from  the  bifrequency  plane.  The 
difference  between  the  excision  performance  of  either 
of  the  above  filters  (cascade  or  multi-notch)  seems  to 
be  negligible.  Figure  3  shows  the  effect  of  applying  bis¬ 
pectral  based  excision  (bit  error  rate  being  zero  for  an 
80  bit  sequence)  (top).  Note  that  considerable  bit  er¬ 
ror  is  experienced  when  the  jammer  frequencies  change 
by  about  1%  of  their  original  value  (middle),  and  the 
bottom  figure  shows  the  effects  of  ignoring  the  jam¬ 
mer  coupling  component  (at  frequency  Ai  -h  A2)  which 
is  only  identifiable  in  the  bispectral  domain.  Clearly 
the  information  obtained  from  the  bispectrum  of  the 
incoming  signal  is  important  for  a  satisfactory  jammer 
excision. 

The  performance  of  the  proposed  multitone  jam¬ 
ming  excision  system  changes  significantly  when  the 
transmitted  signal  is  binary  phase  shift  keying  (BPSK) 
modulated  and  depends  on  the  carrier  frequency  of 
the  BPSK  system.  The  effect  is  primarily  due  to  the 
multinotch  excision  filter  used  and  the  separation  be¬ 
tween  the  jammer  tones  and  the  carrier.  As  the  carrier 
frequency  approaches  either  of  the  jammer  frequencies, 
the  bit  error  rate  increases  significantly,  which  indicates 
that  in  a  BPSK  mode  the  presence  of  multitone  coupled 
jamming  could  have  severe  effect  on  the  performance 
of  a  DS  spread  spectrum  communications  system. 

3.  NON-GAUSSIAN  AR  JAMMING 

The  second  class  of  jamming  sources  considered  in  this 
study  are  those  that  can  be  modeled  as  MA,  AR,  or 
ARM  A  processes  driven  by  non- Gaussian  noise.  It 
is  well  known  that  the  ARMA  parameters  of  a  non- 
Gaussian  random  process  cannot  be  accurately  esti¬ 
mated  using  second  order  statistics  or  the  power  spec¬ 
tral  density.  The  third  order  cumulant  and  the  bis¬ 
pectrum  are  used  to  estimate  the  ARMA  parameters 
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of  a  non- Gaussian  jammer  interfering  with  a  pseudo 
random  communication  signal  with  zero  bispectrum. 
An  excision  filter  based  on  the  estimate  of  the  jammer 
ARMA  parameters  is  then  used  to  mitigate  the  effects 
of  the  jammer.  The  simulations  performed  in  this  study 
focus  on  the  autoregressive  case  where  the  jammer  is 
synthesized  using 

m 

j{n)  =  ^akj{n  -  k)  +  w{n)  (11) 

k  =  l 

where  {ti^(n)}  is  a  non-Gaussian  white  signal  with  con¬ 
stant  bispectrum.  It  is  noted  in  [1]  that  as  a  conse¬ 
quence  of  sampling  at  the  chip  rate,  the  desired  signal 
s(n)  is  uncorrelated  with  the  received  signal  r[n^l)  for 
/  =  1, 2, . . . ,  m  provided  that  m  is  less  than  the  length 
of  the  PN  sequence.  Therefore  the  jammer  (without  the 
signal  s{n))  can  be  predicted  from  the  received  signal 
using 

m 

=  (12) 

k  =  l 

Which  indicates  that  the  jammer  AR  parameters  can 
be  estimated  from  the  bispectrum  or  the  third  order  cu- 
mulant  of  the  received  signal  after  which  the  incoming 
signal  is  whitened  using  a  transversal  filter  with  coeffi¬ 
cients  equal  to  those  of  the  estimated  AR  parameters 
given  as 

m 

=  l-^afce-J^"''  (13) 

k  =  l 

Clearly,  a  better  estimate  of  the  jammer  AR  param¬ 
eters  can  be  attained  by  using  higher  order  statistics 
along  with  second  order  statistics.  The  are  several  algo¬ 
rithms  proposed  in  the  literature  for  estimating  the  pa¬ 
rameters  of  a  non-Gaussian  AR  processes.  The  method 
used  in  this  study  relies  on  using  cumulant  slices  of 
second  and  third  order  [9].  An  adaptive  AR  parame¬ 
ter  estimation  technique  that  relies  on  the  third  order 
statistics  of  the  incoming  signal  is  more  desirable  in 
scenarios  of  strong  temporal  dependence  of  the  AR  pa¬ 
rameters. 

4.  CONCLUSIONS 

Higher  order  spectral  analysis  in  general  and  the  bis¬ 
pectrum  in  particular  provide  a  unique  platform  for 
identifying  mulltitone  jammers  with  inherent  coupling 
or  correlation.  The  bispectral  frequency  pair  of  each 
jammer  component  is  estimated  and  an  excision  is  ap¬ 
plied  at  the  estimated  frequencies  as  well  as  their  sum. 
Autoregressive  non-Gaussian  jammers  are  another  class 
of  interference  that  benefit  from  cumulant  based  exci¬ 
sion  where  the  jammer  AR  signal  is  whitened  using 


a  transversal  reciprocal  filter.  The  results  show  the 
significance  of  bispectral  based  excision  techniques  in 
direct  sequence  spread  spectrum  communications  and 
they  also  show  the  unique  ability  of  bispectral  tech¬ 
nique  to  mitigate  the  effects  of  two  potential  classes  of 
jammers. 
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Figure  1:  Bispectral  based  coupled  multitone  jammer 
excision  system 


Figure  2:  Cumulant  based  non-Gaussian  AR  jammer 
excision  system 


Figure  3:  Performance  in  bit  error  rate  versus  jammer- 
to-signal  ratio  showing  complete  jammer  excision  (top), 
the  effects  of  1%  temporal  change  in  jainmer  frequen¬ 
cies  (middle),  and  the  effect  of  not  nulling  the  third 
coupling  component  (bottom). 
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Abstract 

Linear  filter  theory  based  on  Wiener  filtering  is  well 
understood  and  used  widely  in  many  fields  of  image  and 
signal  processing.  However,  the  use  of  linear  filters  is 
generally  associated  with  implicit  approximations. 
Therefore,  in  this  work  a  series  of  non-linear  filters  is 
developed  based  on  the  concepts  of  Volterra  series  and 
these  are  applied  to  image  interpolation  problems.  More 
explicitly,  the  aim  is  to  interpolate  one  field  of  a  frame  of 
a  television  picture  to  form  an  estimate  of  the  second 
field.  This  is  known  as  de-interlacing  and  is  useful  in 
many  areas  of  video  processing,  for  example  standards 
conversion.  Conventional  de-interlacing  systems  use  a 
fixed  linear  combination  of  the  pixels  in  the  aperture.  In 
this  paper  we  consider  the  extension  of  these  methods  to 
allow  estimators  based  on  non-linear  combinations  of 
pixel  values. 


1.  Introduction 

Realistic  images  have  non-Gaussian  statistics;  hence  a 
linear  interpolator  is  not  guaranteed  to  be  optimal  and 
there  may  be  a  performance  gain  to  be  exploited  in 
employing  a  non-linear  interpolation  scheme  [4].  Such  a 
non-linear  interpolation  scheme  can  exploit  any  of  the 
plethora  of  non-linear  models  available.  However  we 
concentrate  on  a  Volterra  type  of  model  [1]  in  which  only 
polynomial  combinations  of  pixel  values  in  the  aperture 
are  formed. 

The  de-interlacing  problem  can  be  summarised  as: 
Given  the  knowledge  of  one  field  of  a  frame  of  a 
television  picture,  is  it  possible  to  reconstruct  the  other 
field?  This  is  done  by  assuming  knowledge  of  both  fields 
of  a  particular  frame  and  then  calculating  the  best 
(linear/non-linear)  filter  that  maps  between  them.  This 
filter  can  then  be  applied  to  the  first  field  from  other 
frames  in  order  to  reconstruct  the  whole  frame. 


2.  Non-linear  filtering 

For  a  linear  system  the  relationship  between  the  input 
x(t),  the  output  y(t)  and  the  system’s  impulse  response  h(t) 
is  given  by  the  convolution  integral.  For  de-interlacing,  if 
the  process  of  mapping  between  one  field  and  the  other  is 
a  linear  process  it  would  be  possible  with  the  knowledge 
of  the  two  fields  to  calculate  the  impulse  response  or 
filter,  h(t),  that  when  applied  to  field,  fj,  gave  exactly  field 
f2.  In  practice  this  filter  would  have  to  be  of  finite  length 
and  so  it  would  not  exactly  give  field  f2.  However,  in 
reality  this  mapping  process  is  non-linear.  This  will  be 
due  to  a  number  of  reasons,  for  example,  the  field  will 
contain  aliased  elements,  and  aliasing  is  a  non-linear 
process.  Hence  it  is  likely  that  some  form  of  non-linear 
operation  will  yield  a  better  estimate  of  f2  from  fj. 

3.  The  Volterra  Model 

In  time  series  analysis  the  Volterra  model  is  thought  of 
as  an  extension  of  the  linear  convolution  integral.  Rather 
than  calculating  the  convolution  of  the  linear  impulse 
response  of  a  system  with  its  input,  one  considers  an 
infinite  sum  of  higher  order  impulse  responses  convolved 
with  interactions  of  the  input.  A  Volterra  model, 
truncated  at  the  second  order,  contains  only  linear  and 
quadratic  filters  and  so  is  only  able  to  model  systems 
which  contain  quadratic  type  non-linear  elements.  Such 
models  are  useful  when  identifying  non-linear  systems 
which  skew  the  input.  A  Volterra  model  truncated  at  the 
third  order,  contains  linear,  quadratic,  and  cubic  filters. 
These  models  allow  the  identification  of  non-linear 
systems  which  symmetrically  distort  the  input  signal. 
Higher  order  Volterra  models  allow  even  greater 
generalisations.  There  are  two  penalties  associated  with 
the  use  of  higher  order  models;  firstly  the  computational 
requirements  increase  dramatically,  secondly  the  data 
lengths  required  to  obtain  accurate  parameter  estimates 
also  increases.  In  discrete  form  the  Volterra  model  can  be 
expressed  as, 
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y(n)=  Xh,(k,)x(n-k,) 

k,=0 

+  £  thj(k,,kj)x(n-k,)x(n-kj) 

k,=0kj=0 

+  i  i  ih,(k„kj,k,)x(n-k,)x(n-kj)x(n-k,) 

kj=0k2=0k3=n 

k,=0  k„=0 

hM(ki,...kM)  is  called  the  order  Volterra  kernel  or 
Volterra  filter.  Note  that  a  linear  system  is  just  a  first 
order  Volterra  system  and  hence  only  has  a  first  order 
filter  hi(k).  To  fully  characterise  a  non-linear  system 
using  the  Volterra  series  it  is  necessary  to  calculate  an 
infinite  number  of  higher  order  filters.  In  practice  this  is 
not  possible  and  so  it  is  necessary  to  truncate  the  Volterra 
series  at  some  order.  In  this  work  we  will  consider 
Volterra  filters  which  are  truncated  at  the  second  order, 
third  order  and  fifth  order. 

4.  Optimisation  of  filters 

Consider  a  system  as  in  figure  1.  The  object  is  to 
design  an  N  point  digital  finite  impulse  response  filter,  h, 
to  modify  the  input,  x(n),  in  such  a  way  as  to  minimise  the 
mean  square  error,  e(n),  between  the  filter  output  and  the 
desired  signal,  y(n).  In  the  case  of  de-interlacing,  x(n)  is 
field  fi  and  y(n)  is  field  f2.  The  aim  is  to  create  a  filter,  h, 
that  when  operating  upon  fi  gives  an  optimal  estimate  of  f2 
in  the  sense  that  the  mean  squared  error  is  minimised. 

y(n)| 


The  filter  impulse  response  which  minimises  the  sum 
of  the  squared  errors  of  data  of  length  L,  is  given  by  the 
solution  of  the  over-determined  (assuming  L>N)  system 
of  equations 


Xh  =  y  where, 


■  X(L) 

x(L-l)  • 

x(L-N  +  l)' 

■  y(L)  ■ 

x(L-l) 

x(L-2)  ■ 

x(L-N) 

y(L-l) 

x  = 

x(2) 

x(l)  • 

0 
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y(2) 

.  x(l) 
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0 

.  y(i)  . 

the  least  squares  solution  of  which  is  [2], 
h  =  (X'^X)‘x'^y. 

Note  X^X  and  X’^Y  are  usually  much  smaller  than  X. 
Hence,  it  is  much  more  efficient  to  compute  X^X  and  X  Y 
directly  from  x(n)  and  y(n)  rather  than  to  form  X. 

The  same  method  can  be  extended  to  obtain  an 
estimate  of  the  M*  order  non-linear  filter.  For  the  sake  of 
this  explanation,  a  second  order  Volterra  system  which 
contains  only  linear  and  quadratic  filters  will  be 
considered,  although  it  is  conceptually  easy  to  extend  the 
method  to  any  arbitrary  order. 

The  extension  of  this  to  a  more  general  Volterra  model 
is  in  principle  simply  a  matter  of  modifying  the  data 
matrix  X.  Below  we  show  the  data  matrix  for  a  second 
order  Volterra  model,  in  which  a  constant  (DC)  term  has 
also  been  included.  A  symmetric  form  for  the  Volterra 
kernel  has  been  assumed  so  this  matrix  has  dimension 


Figure  1:  Linear  filtering 
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The  optimal  filter,  in  the  least  squares  sense,  can  then 
be  estimated  by  computing  (X^X)"*X^y.  The  filter  will 
contain  three  separate  components,  the  DC  term,  the 
standard  linear  coefficients  which  should  be  multiplied  by 
single  pixel  values,  and  the  quadratic  coefficients  which 
will  be  multiplied  by  products  of  pixel  values.  A  six  point 
cubic  Volterra  filter  is  shown  as  an  example  in  figure  3. 
The  first  point  is  the  DC  term,  the  next  six  points  are  the 
linear  coefficients  which  can  be  seen,  typically,  to  have  an 
approximate  sin(x)/x  structure.  The  next  21  points  are  the 
quadratic  coefficients,  and  the  final  56  points  are  the  cubic 
coefficients. 

5.  Choice  of  filter  order  and  aperture 

A  Volterra  model  truncated  at  the  third  order  is  used 
predominantly  in  this  work.  This  will  contains  linear, 
quadratic,  and  cubic  filters  and  so  is  able  to  model 
systems  which  contain  both  quadratic  and  cubic  non-linear 
elements.  These  generate  both  skewed  and  symmetric 
distortions  of  the  probability  density  function.  Higher 
order  models  can  be  used  and  are  shown  to  give  improved 
results  but  the  size  of  the  filter  and  the  computation 
required  in  its  estimation  rise  exponentially  and  there  are 
rapidly  diminishing  returns.  For  example,  the  fifth  order, 
six  pixel  cubic  Volterra  filter  does  perform  better  than  the 
third  order  six  pixel  filter  but  there  are  over  five  times  as 
many  terms. 

For  the  linear  case  a  simple  four  point  vertical  filter, 
see  figure  2,  gives  an  error  of  18.61.  Increasing  the 
number  of  taps  does  not  significantly  reduce  the  mean 
squared  error,  (the  error  for  a  eight  point  vertical  filter  is 
18.58,  and  for  a  36  point  filter  is  18.55),  nor  does  utilising 
pixels  in  the  horizontal  direction  significantly  reduce  the 
error. 

For  the  non-linear  case  the  choice  of  aperture  has  much 
more  dramatic  results.  A  two-dimensional  aperture  does 
give  a  significant  improvement  over  a  one-dimensional 
one.  This  is  thought  to  be  due  to  the  ability  of  the  non¬ 
linear  filter  to  deal  with  sloping  edges  and  lines  and  hence 
the  filter  needs  gradient  information.  However  the 
number  of  filter  coefficients  rises  rapidly  with  the  number 
of  pixels.  Due  to  computational  constraints  the  maximum 
size  for  a  cubic  filter  is  20  pixels  and  for  a  fifth  order  filter 
is  6  pixels. 

As  the  number  of  pixels  available  is  limited  it  is 
important  to  choose  the  correct  shape  of  aperture.  A  good 
compromise  are  apertures  that  contain  four  vertical  pixels 
and  then  a  number  of  horizontal  pixels.  The  apertures 
used  for  the  4,6,8  and  20  pixel  filters  are  shown  in  figure 
2  (x  denotes  the  pixels  used  in  field,  fi,  to  estimate  the 
pixel  denoted  by  •  in  field,  f2).  It  is  thought  that  the 
horizontal  pixels  help  to  cope  with  the  near  horizontal 


lines  and  edges  that  often  cause  problems  due  to  jagging 
in  a  de-interlaced  picture. 

XX  X 

X  XXX  XXX 

X  X  X  X  X 

XX  X 

4  pixel  6  pixel  8  pixel 

aperture  aperture  aperture 

X  X  X  X  X 
XXXXXXXXX 

xxxxxxxxx 

X  X  X  X  X 
20  pixel  aperture 

Figure  2:  Choice  of  aperture  for  non-linear  filters 

6.  Results  for  Volterra  filters 

Table  1  shows  the  mean  squared  error  between  the 
estimated  field  and  actual  field  for  the  picture  ‘girl’  for  a 
series  of  different  filters.  It  can  be  seen  that  in  all  cases 
the  non-linear  filters  perform  better  than  standard  linear 
filters.  The  higher  order  non-linear  filters  produce  very 
good  results,  with  smooth  edges  on  both  curves  and 
straight  lines  and  little  jagging.  They  only  differ  fi-om  the 
original  field  in  regions  of  fine  detail  where  it  would  be 
unexpected  for  any  filter  to  work  as  they  cannot  create 
information  that  is  not  in  the  input  field. 


Filter  type 

Mean  square 
error 

Number  of 
coefficients 
in  filter 

4  pixel  linear  filter 

18.61 

4 

8  pixel  linear  filter 

18.58 

8 

36  pixel  linear  filter 

18.55 

36 

4  pixel  cubic  filter 

16.14 

35 

6  pixel  cubic  filter 

15.67 

84 

8  pixel  cubic  filter 

15.21 

165 

20  pixel  cubic  filter 

13.50 

1770 

6  pixel  fifth  order 
filter 

14.88 

462 

Table  1 :  Mean  squared  errors  for  various  filters  used 


on  picture  ‘girl’ 

Figures  4  and  5  show  a  small  section  from  a  de¬ 
interlaced  version  of  the  picture  ‘girl,’  estimated  with  an 
optimum  four  point  filter,  and  with  an  optimum  twenty 
pixel  cubic  Volterra  filter,  respectively.  These  can  be 
compared  with  the  original  picture,  shown  in  figure  6.  It 
can  be  seen  how  the  non-linear  filter  produces  much 
smoother  edges  and  curves  than  its  linear  counterpart  with 
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coefficient  magnitude 


reduced  jagging.  This  is  most  evident  around  the  inner 
curve  of  the  table  leg. 


Figure  3:  Filter  coefficients  for  a  six  pixel  cubic 
Volterra  filter 


Figure  4:  Section  of  ‘girl*  estimated  with  a 
4  point  linear  filter 


Figure  6:  Section  of  ‘girl*  from  original  field 

7.  Conclusions 

•  Non-linear  Volterra  type  filters  can  give  dramatically 
improved  performance  over  conventional  linear  predictors 
when  used  for  de-interlacing. 

•  Volterra  filters  are  generally  more  complex  than  their 
linear  equivalents,  both  to  design  and  implement. 

•  So  far  these  techniques  have  only  been  applied  to  de¬ 
interlacing,  but  there  are  many  other  areas  in  this  field 
where  we  can  expect  non-linear  techniques  to  give 
improved  performance. 
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Figure  5:  Section  of  ‘girl*  estimated  with  a 
20  point  cubic  Volterra  filter 
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Abstract 

Higher  Order  Statistics  (HOS)  are  well  suited  to 
solving  detection  and  classification  problems  because 
they  can  suppress  gaussian  noise  and  preserve  some  of 
the  non-gaussian  information  [1]  [2].  This  paper 
describes  the  use  of  these  methods  for  acoustic  quality 
control  of  manufactured  goods  on  a  production  line,  and 
specifically  the  detection  of  faulty  fanbelts  on  the  drying 
block  of  a  washing  machine.  Two  HOS  based  methods 
were  used  in  this  paper.  The  first  is  based  on  the 
properties  of  the  bispectrum  in  the  outer  triangle  and 
particularly  of  the  normalized  bispectrum  also  called 
skewness  function.  The  second  uses  the  third  order 
cumulant  of  a  Matched  Filter  output.  This  combination 
has  the  advantages  of  matched  filtering  plus  the 
properties  of  higher  than  second  order  statistics  making 
this  algorithm  more  robust  than  the  conventional  matched 
filter.  The  method  was  used  to  classify  real  signals  from 
fanbelts  suffering  from  specific  known  defects. 


1  .Introduction 

For  detecting  defaults  in  manufactured  goods  in  a 
production  line,  acoustical  quality  control  is  a  very 
interesting  tool  due  to  its  simple  handling.  Sounds 
frequently  reveal  the  defects  in  an  engine  and  can  even 
identify  the  fault.  This  study  was  designed  to  detect  non 
destructively  faulty  fanbelts  on  the  drying  block  of  a 
washing  machine  at  the  end  of  the  production  line.  The 
fanbelt  defect  appears  like  a  shock  on  the  acoustical 
signature  and  changes  the  shape  of  its  probality  density 
function.  With  this  point  of  view,  we  suggest  to  use 
Higher  Order  Statistics  .  In  fact,  HOS  with  their  ability  to 
suppress  gaussian  noise  and  to  preserve  some  of  non- 
gaussian  information  seem  to  be  more  interesting  for 
analysing  measurements  made  in  a  noisy  environment. 

Tlie  first  section  describes  the  use  of  a  bispectrum 
based  detector.  This  detector  uses  the  properties  of  the 


bispectrum  which  are :  for  a  stationary  and  imaliased 
signal,  the  bispectrum  will  be  zero  in  the  Outer  Triangle 
(OT)  [3].  When  the  process  is  nonstationary,  Hinich  and 
Wolinsky  [4]  have  shown  that  the  bispectrum  is  usually 
not  zero.  A  transient  cause  the  observed  signal  to 
becomes  nonstationary  in  the  window  where  the  shock  is 
present. 

The  second  section  examines  another  detection  scheme 
using  HOS.  The  proposed  algorithm  was  performed  by 
Giannakis  and  Tsatsanis  [5],  and  combines  die  matched 
filtering  operations  with  properties  of  HOS. 

The  last  section  examine  two  different  fanbelts  classes 
which  are  called  “click”  and  “beating”.  The  beat  is  caused 
by  defective  belt  tension  due  to  its  pull.  The  second  faulty 
class  is  due  to  the  fanbelt  hitting  the  drying  block  of  the 
washing  machine,  producing  a  sharp  sound  in  the 
acoustical  signature. 

To  classify  the  different  classes  we  use  the  work 
performed  by  Giannakis  and  Tsatsannis  and  the  results 
were  compared  to  those  obtained  with  the  conventionnal 
matched  filter. 

2.The  detection  hypothesis  testing 

Let  us  consider  the  following  simple  binary  hypothesis 
testing  problem : 

H^:X{i)^N{i) 

H^\X{i)  =  S{i)+N{i) 

ioxi  =  0,l....N. 

where  S(i)  is  a  known  reference  signal  and  N(i)  is  a 
sample  realization  from  a  zero-mean,  gaussian  noise 
process  with  unknown  covariance  sequence.  The 
detection  method  must  decide  between  Hq  and  Hj  by 
analysing  the  received  signal  S(i). 

If  S(i)  is  a  non-gaussian  signal  with  non-zero  kth-order 
polyspectrum  (or  moment  spectrum),  and  if  the  noise  is 
gaussian,  then,  their  cumulant  and 

their  higher  order  spectra  (/i  •  >/a:-i  )  =  0  over  their 
corresponding  time  and  frequency  domain.  The  above 
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hypothesis  testing  can  be  reformulated  in  the  higher  order 
domain  as : 

in  the  time  domain  for  all  lags  (Wj  . 

Similarly  in  the  frequency  domain  we  have  : 

K-\)  ^  ^ 

on  all  the  principal  domain 

A  thorough  review  of  higher  order  statistics  can  be  found 
in  [4]. 


3.The  bispectrum  based  detector 

Reviews  of  the  bispectrum  can  be  found  in  [1]  and  [6]. 
However,  there  is  need  for  action  to  remind  the  properties 
of  the  bispectrum  in  the  principal  domain. 

In  fact,  in  respect  with  their  symetry  properties,  it’s 
sufficient  to  consider  the  behavior  of  the  bispectrum  in 
the  principal  domain  (figure  1),  defined  as  the  triangle 
{0<g</;2/  +  g<A^}. 


Figure  1  :  The  Principal  domain  ( in  normalized 
pulsation) 

The  principal  domain  is  divided  into  two  regions.  The 
first  region,  satisfying  if  +  N  /2}  is  called  the  mner 

triangle  IT  (/r=  7na/ig/e[(0,0),(7r,0),(y,y)]  in  figure 

1).  The  remaining  region  is  called  the  outer  triangle  OT 

(Or=r/^g?e[(7t,0),(|.^).(y  >y)]  in  figure  2). 

For  a  stationary  unaliased  signal,  the  bispectrum  is 
zero  in  the  OT  triangle.  And  more,  if  the  signal  is 
gaussian,  the  bispectrum  will  be  theoritically  zero  in  all 
the  principal  domain.  If  the  process  is  nonstationary 
Hinich  and  Wolinsky  [4]  proved  that  the  bispectram  will 
usually  be  nonzero.  A  non-random  transient  signal  in 
additive  stationary  gaussian  noise  makes  the  observed 
signal  nonstationary  in  the  window  where  the  signal  is 
present.  See  also  [3]  and  [7]  for  more  detail. 

For  a  sampled  signal,  an  unbiased  estimate  of  the 
discrete  bispectrum  called  the  biperiodogram  is  given  by  : 

B(f,g)  =  -f-rX(f)Xig)X*(f  +  g)  (1) 

A^  +  1 


where  X(f)  =  represents  the  discrete 

«=o 

Fourier  transform  of  the  N+1  samples  {x(w);0  <  «  <  A^} . 

The  estimate  of  the  biperiodogram  is  unbiased  but  its 
variance  is  proportional  to  the  product  of  the  Fourier 
transforms.  Fu^ermore,  as  N  becomes  larger,  the 
variance  of  the  estimate  increases  and  the  estimate 
becomes  inconsistent. 

A  consistent  estimate  can  be  obtained  by  dividing  the 
data  record  into  K  frames  of  L  samples,  with  or  without 
overlapping.  A  time  domain  window  can  be  applied  to 
each  frame  and  the  FFT  computed  by  : 

Xk{f)= 

The  bispectram  Bij(f,g)  of  the  kth  record  is  computed  as  in 
(1).  The  final  estimate  is  obtained  by  averaging  across  the 
K  frames  : 

B(f,g)  =  ^tw,g) 

^  k=\ 

The  method  used  in  tiiis  paper  is  based  on  the  change 
of  amplitude  and  temporal  skew  of  the  signal  when  a 
transient  appears.  A  natural  measure  of  this  skew  is  the 
third  order  cumulant.  Parsons  and  Williams  [8]  showed 
that  this  measure  can  be  constructed  in  the  bispectral 
domain  as  follow : 

-  for  amplitude  skew  : 

f>g 

and,  for  temporal  skew  : 

d,  =  YWif,g))^ 

f,g 

If  these  measures  are  summed,  and  integrated  in  the  OT 
triangle,  we  can  construct  a  more  robust  estimate  for 
testing  the  deviation  to  stationarity  when  the  transient 
signal  appears.  The  output  of  this  bispectral  energy 
detector  will  be : 

d  =  'ZW,gf 

OT 

And  the  detection  scheme  is  : 


r(t) 


zi5(/.g)r 


Threshold 

comparison 


Figure  2  :  block  diagram  of  the  bispectral  energy 
detector 

The  hypothesis  testing  becomes  : 

-  If  hypothesis  Hq  is  true,  the  measure  d  will  be  zero. 

-  If  hypothesis  H,  is  true,  the  signal  will  be  non-stationary 
and  tibe  measure  rf  in  OT  will  be  different  from  zero.  The 
detection  of  transient  signal  presence  will  be  realized  by 
thresholding.  The  performances  of  this  detector  are  given 
in  simulation  in  [9]. 
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The  performances  of  the  test  can  be  improved  by  using 
the  normalized  bispectrum  also  call  the  Skewness 
function : 


Bskif,g)  = 


P(f)P{g)P{f  +  g) 

where  P(f)  is  the  final  estimate  of  the  power  spectrum. 
When  the  transient  appears,  the  measure  d  is  defined  now 


as : 


d  = 

OT 

increases  because  the  signal  becomes  locally  non¬ 
stationary. 

We  can  see  on  the  figure  3  the  results  of  the  skewness 
function  with  (a)  and  without  a  fault  (b)  : 

(a)  (b) 


Figure  3  :  Skewness  function  with  and  without  a 
fauit 

But  this  method  is  not  adequate  for  our  application.  In 
fact,  the  test  was  adequate  if  the  size  of  &e  analysing 
window  was  adapted  to  the  transient  size.  In  our  case  it  is 
not  possible  to  Imow  the  duration  of  the  transient  in  the 
acoustical  signature  and  the  results  were  very  poor.  For 
this  reason  we  prefer  to  use  the  algorithm  described  in  the 
next  section. 


4.Matched  filtering  and  HOS  detector 

Let  us  consider  that  the  reference  signal  has  a  finite 
impulse  response  s(i),  so  that,  the  filter  matched  to  this 
signal  has  a  finite  impulse  response  h{i)  =  s{N  -  0  .  If  the 
MF  is  excited  by  x(i),  then  the  kth-correlation  of  the  MF’s 

output  y  (7^  is  given  by  : 

N 


in  a  noise  free  case  we  obtain  : 

N 

and  for  zero  lag  : 

N 

J\r-JK-f=-N 

where  Ers  is  the  energy  of  the  signal’s  kth-order 
correlation.  The  Cauchy-Schwartz  inequality  can  be  use 
to  show  fliat : 


therefore,  the  kfli-order  correlation  peaks  at  the  origin. 
This  is  the  basis  for  the  detection  algorithm  proposed  by 
Giannakis  and  Tsatsanis  in  [5]. 

-  For  the  Hq  hypothesis,  the  kth-correlation  of  the  MF’s 
output  leads  toward  Cj^  which  is  zero  (for  k>2)  if  the 
noise  is  gaussian. 

-  According  to  the  H,  hypothesis,  the  kth-order 
cumulant  of  the  MF’s  output  leads  toward  the  kth-order 
cumulant  energy  of  the  signal  s(i)  if  the  signal  has  a 
zero  mean  and  if  several  independant  records  are 
available,  (see  [5]  for  details) 

The  detection  rale  becomes  a  comparison  of 
|cjfy(0,...,0)|,  the  absolute  value  of  zeroth  kth-order 

cumulant  lag  of  the  filtered  observation  sequence  with  a 
threshold.  For  k=3,  the  block  diagram  of  the  detector  is 
shown  in  figure  4. 


h(N-i) 

Figure  4  :Block  diagram  of  the  3th-order 
cumulant  based  detector 

In  practice,  the  true  correlation  value  is  not  available. 
This  value  can  be  estimated  by  computing  the  three  order 
correlation  directly,  and  for  zeroth  lag  we  obtain  the 


simple  sum : 

1  2N 

Ta:(0.0)-  (O’O) 

however,  cumulant  squared  with  correlation  up  to  the 
third  order  statistics.  If  k>4,  we  need  to  compute  the  kth 
order  cumulant  of  the  MF’s  output.  We  can  also  compute 
a  more  robust  estimate  by  averaging  over  several 
independent  segments,  insensitivity  to  gaussian  noise 
being  true  in  the  mean . 

This  detector  can  be  viewed  as  a  cumulant  energy 
detector  or  in  the  frequency  domain  like  a  polyspectral 
energy  detector.  This  algorithm  is  attractive  because  it 
requires  the  computation  of  a  single  cumulant  lag. 

Our  application  has  not  one  but  several  reference 
signals  (for  all  classes  of  fanbelts).  Then,  we  are 
interested  in  generalizing  detection  algorithm  to 
classification  problems.  In  this  case,  we  have  a  bank  of  L 


MF’s  with  impulse  response 


The  filters  are 


normalized  to  have  a  zero  mean  and  equal  kth-order 
cumulant  energy  E^.  One  way  to  accomplish  this  is  by 
scaling  i.e. : 


Where  are  chosen  such  that  [5]. 

The  classification  was  done  with  the  properties  of  higher 
order  statistics  that  can  be  summarized  :“if  a  bank  of 


45 


equal  correlation  energy  MF’s  is  used  to  classify  an 
incoming  signal,  the  maximum  zeroth  kth-order 
correlation  appears  at  the  output  of  the  filter  matched  to 
the  incoming  signal”. 


Figure  5  :  The  third  order  cumulant-based 
classifier 


5.  Experimental  results 

The  reference  signatures  for  the  design  of  each 
matched  filter,  were  obtained  by  synchronous  averaging 
over  1500  records  made  in  an  anechoid  chamber  for  each 
class  of  fanbelts.  Fanbelts  were  assigned  unambiguously 
in  a  reference  class  by  a  human  expert.  When  the  decision 
was  not  unambiguous,  the  fanbelt  with  the  greatest 
ambiguity  was  excluded  from  the  classification  testing. 

Test  signals  were  obtained  by  synchronous  averaging 
over  ten  records  and  sent  to  the  filter  bank.  The 
estimating  cumulant  was  obtained  by  averaging  many 
cumulants  of  MF’s  output  from  several  independant 
records. 

The  results  are  shown  in  table  1  for  15  averages  of  the 
filter  output : 


true 

conditions 

Classified  conditions 

‘good  ‘ 

‘beat’ 

‘cUck’ 

‘good’ 

86.5% 

0% 

13.5% 

‘beat’ 

6.5% 

67% 

26.5% 

‘click’ 

0% 

0% 

100% 

Table  1  :  results  of  the  classification  procedure 
with  HOS  ciassifiers 


and  for  conventionnal  matched  filter  : 


true 

conditions 

Classified  conditions 

‘good  ‘ 

‘beat’ 

‘click’ 

‘good  ‘ 

69% 

11% 

20% 

‘beat’ 

14% 

63% 

23% 

‘click’ 

6.2% 

30% 

63.8% 

Table  2  :  results  of  the  classification  procedure 
with  conventionnai  MF’s  classifiers 


The  above  table  2  shows  that  the  results  for  the  HOS 
classifier  were  much  better  than  those  for  the  MF 
classifier.  This  improvement  is  due  to  the  HOS  classifier 
being  less  sensitive  to  ambiant  noise.  The  number  of 
averages  in  the  cumulant  estimation  also  influence 
detection. 

6.  Conclusion 

This  paper,  describes  the  use  of  higher  order  statistics 
to  classify  fanbelt  defects  by  acoustical  analysis.  Their 
use  resulted  in  excellent  performance  of  the  third  order 
cumulant  detector  with  regard  to  the  matched  filter. 
Performance  can  be  improved  by  using  larger  records  and 
numbers  of  averages  for  the  cumulant  estimation.  Other 
methods  are  presently  being  investigated  to  improve  the 
classification.  Thus  time-frequency  appears  to  be  a 
promising  tool  [10]. 
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Abstract 

The  search  for  discriminating  features  is  a  crucial 
point  when  a  modulation  classification  task  is  aimed. 
This  paper  introduces  new  features  based  on  a  combina¬ 
tion  of  fourth-  and  second-  order  temporal  cyclic  cumu- 
lants.  Such  a  combination  enhances  the  theoretical  dis¬ 
crimination  that  can  be  achieved  by  a  single  stationary 
cumulant,  and  moreover^  the  cyclic  parameters  become 
discriminating,  whereas  it  is  not  the  case  when  they  are 
considered  at  pure  orders.  As  an  application,  we  pro¬ 
pose  a  process  to  classify  f-PSK  vs  16-QAM  modula¬ 
tions.  The  classification  is  achieved  by  estimating  the 
feature  for  the  received  signal,  and  comparing  it  with 
theoretical  ones  by  a  matched  filter  technique.  Simu¬ 
lations  show  that  though  the  cyclic  parameters  are  a 
priori  more  discriminating  than  their  stationary  coun¬ 
terparts,  the  variance  of  their  estimates  may  overcome 
this  advantage. 


1.  Problem  statement 

The  modulation  classification  process  consists  in  de¬ 
termining  the  modulation  type  of  an  intercepted  signal 
corrupted  by  noise.  This  is  a  challenging  problem  that 
has  been  investigated  for  several  years.  Since  a  gen¬ 
eral  approach  of  the  modulation  classification  process 
is  very  difficult  to  draw  up,  each  case  has  to  be  con¬ 


sidered  separately,  and  specific  methods  must  be  found 
according  to  the  classes  of  modulation  we  want  to  dis¬ 
criminate.  One  of  the  classical  way  to  tackle  with  the 
problem  of  modulation  classification  is  based  on  maxi¬ 
mum  likelihood  theory  [2] .  It  consists  in  processing  the 
log-likelihood  function  of  the  signal  (or  an  approxima¬ 
tion  thereof)  and  then  comparing  it  to  an  appropriate 
threshold.  This  very  general  approach  can  be  restricted 
to  some  features  extracted  from  the  signal,  for  which 
the  theoretical  expressions  are  perfectly  known.  For 
example,  QAM  modulation  classification  has  been  re¬ 
cently  achieved  [3]  using  stationary  fourth-order  cumu- 
iants  of  the  signal  as  discriminating  features. 

In  our  paper,  we  generalize  the  discriminating  fea¬ 
ture  given  in  [3],  by  considering  the  cyclostationary 
property  of  x{i)  and  enlightening  an  original  way  to 
enhance  the  discrimination. 

The  use  of  (second-  or  higher-order)  cyclostationar- 
ity  property  of  time  series  has  already  proved  to  be 
very  helpful  in  many  applications  [1][8],  especially  to 
characterize  telecommunication  signals.  The  use  of  this 
property  enables  access  to  more  information  about  the 
analyzed  signal  and  can  be  very  helpful  when  a  mod¬ 
ulation  classification  task  is  addressed.  In  our  appli¬ 
cation,  we  will  show  that  the  introduction  of  the  cy¬ 
clostationary  property  leads  to  features  that  are  more 
discriminating  as  far  as  a  theoretical  point  of  view  is 
considered.  However,  since  reasonably  good  estimates 
of  cyclic  parameters  require  much  more  samples  than 
for  stationary  parameters,  both  methods  have  to  be 


0-8186-8005-9/97  $10.00  ©  1997  IEEE  47 


compared,  which  will  be  done  in  the  sequel. 

In  our  algorithm  the  discriminating  feature  is  ob¬ 
tained  by  a  proper  combination  of  second-  and  fourth- 
order  cyclic  temporal  cumulants.  This  approach  is 
quite  new  for  two  reasons.  Firstly,  we  build  a  new 
statistic  which  is  neither  a  cumulant  nor  a  moment, 
but  is  choosen  in  order  to  be  the  more  discriminating 
as  possible.  Secondly,  cyclic  statistics  have  not  been 
used  yet  to  classify  Af-QAM  (M  >  4)  signals.  More¬ 
over,  it  enforces  the  idea  that  several  orders  must  be 
combined  to  achieve  modulation  classification,  which 
hcts  already  been  pointed  in  a  different  manner  in  [7] . 
In  the  sequel,  details  are  given  only  for  the  4-PSK  vs 
16-QAM  case  (4-PSK  can  be  seen  as  a  4-QAM),  but 
the  method  can  be  extended  to  more  than  two  classes. 

The  paper  is  composed  as  follows.  In  section  2,  we 
derive  the  expressions  of  fourth-  and  second  order  cyclic 
correlations  for  digital  modulations.  In  section  3,  the 
basic  idea  of  the  classifier  is  explained,  and  the  general 
structure  of  the  algorithm  is  given.  Simulations  are 
provided  in  section  4,  where  our  algorithm  is  compared 
to  its  stationary  counterpart. 

2.  Digital  modulations  and  their  cyclic 
multicorrelations 

2.1.  Signals  of  interest 


2.2.  Cyclic  multi  correlations 

Let  Cx,p-{-q,p{i]l)  be  the  (p -h  g)th-order  cumulant- 
based  correlation  of  the  process  a:(^),  defined  with  p 
non-conjugated  terms  and  q  conjugated  terms,  as  in 
[4]: 

Gx,p-\-q,p  (^5  Z)  — 

Cum[x{t),x{t  +  n),  .,.,x{t-\-  Tp-i),  (3) 

X*{t  -  Tp),  ,..,X*{t  -  Tp+q-l)] 

Since  x{t)  is  almost-cyclostationary,  there  are  at 
most  countably  many  values  of  a  for  which  the  so- 
called  (p-h  g)th-order  cyclic  correlation,  defined  by: 

^  (4) 

is  non-zero. 

Let  us  precise  the  expressions  at  order  four  (p  -h  g  = 
4).  We  will  consider  the  definition  in  which  there  are  as 
many  conjugated  as  non-conjugated  terms  (p  =  ^  =  2). 
Applying  (4)  to  the  process  (1),  it  can  be  readily  shown 
that  the  modulus  of  the  cyclic  tricorrelation  (also  called 
fourth-order  temporal  cumulant  [1],[8])  is  given  by: 

|<4.2(r)|  =  J  Q{t)q{t  +  ri)q{t-T2)qit-T3) 

•  exp{—2i7rat)(H\  (5) 


In  our  study,  we  are  interested  in  M-QAM  modu¬ 
lation  classification  {i.e.  Af-states  Quadrature  Ampli¬ 
tude  Modulation).  The  analytic  signal  representation 
for  these  modulations  is  given  by: 

=  (1) 

k 

where  {sk  =  is  a  complex- valued,  zero- 

mean,  and  iA.d.  symbol  sequence,  Tb  is  the  symbol  du¬ 
ration,  fc  is  the  carrier  frequency,  q{t)  is  the  real-valued 
pulse  function,  and  to  is  a  non-random  time  shift.  In 
this  paper,  we  do  not  stationarize  the  signals,  and  con¬ 
sequently,  time-dependency  must  be  taken  into  account 
when  expressing  the  temporal  cumulants  of  (1).  In 
other  words,  x{t)  is  modeled  as  a  cyclostationary  pro¬ 
cess. 

The  received  signal,  for  which  we  want  to  determine 
to  type  of  modulation,  can  be  written  as: 

x{t)  =  s{t)  +  bit)  (2) 

where  b(t)  is  a  stationary  white  gaussian  noise  whose 
power  is  unknown. 


where  ^5,4,2  is  the  stationary  fourth-order  cumu¬ 
lant  of  the  random  sequence  {sk}‘  C's,4,2  = 

Cum[skiSk,  sJJj  5^]-  It  is  necessary  to  consider  the  mod¬ 
ulus,  in  order  to  avoid  terms  depending  on  to  and  /c, 
which  are  a  priori  both  unknown. 

Similarly,  at  order  two,  the  modulus  of  the  cyclic 
correlation  of  the  process  (1)  is: 


[<2,1  (P)  I  =  J 


r)  exp{—2i7rai)dt\ 


(6) 


3.  Classification  strategy 
3.1.  The  basic  idea 

It  has  been  shown  in  [5]  that  the  are  use¬ 

less  in  a  modulation  classification  task  for  QAM  sig¬ 
nals,  because  they  lead  to  proportional  functions  what¬ 
ever  p,  q,  and  a.  Besides,  it  has  been  shown  in  [3]  that 
the  tricorrelations  of  the  stationarized  signal  are  non 
proportional  functions  of  r,  and  so  make  classification 
possible  even  if  the  power  of  the  signal  is  unknown. 
Moreover,  we  pointed  out  in  [5]  that  cumulant-based 
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stationary  and  cyclic  tricorrelations  correspond  to  two 
different  arrangements  of  the  same  fourth-  and  second- 
order  statistics.  Since  it  is  clear  that  one  of  these  ar¬ 
rangements  makes  classification  possible,  whereas  the 
other  does  not,  it  seems  natural  to  search  for  the  com¬ 
bination  of  fourth-  and  second-order  statistics  which  is 
optimal  with  a  view  to  modulation  classification. 

Following  this  principle,  our  discriminating  feature 
will  be: 


fx{n,T2,  T3,T4,X)=  |c“  4  2  (n  ,  T2,  T3)  |  +  A  |c"  2,1  (^4)  f 

(7) 

where  the  parameter  A  should  be  determined  in  order 
to  maximize  the  distance  between  Dx  and  Dy  if  x  and 
y  are  different  modulation  types.  As  for  determining 
the  most  appropriate  cycle  frequency,  a  natural  choice 
is  a  =  since  the  effect  of  stationary  noise  is  thus 
avoided,  and  since  energy  for  higher  harmonics  of  ^ 
is  less  significative.  In  order  to  get  a  correlation  re¬ 
ceiver  which  does  not  need  threshold  computation,  we 
normalize  (7)  and  the  new  feature  becomes: 


Fx(r,A) 


fx{r,\) 

y/Srfxir,  X)dT 


(8) 


where  t=  (n,  r2,  ts,  r4). 


3.2.  Determination  of  the  practical  discrim¬ 
inating  feature 

Since  (8)  is  a  five  dimensional  pattern  (with  respect 
to  T  and  A),  we  propose  a  feature  involving  lower  com¬ 
plexity  by  working  only  with  a  slice  of  Fx{t,\).  We 
will  keep  only  a  line  of  the  three  dimensional  tricor¬ 
relation  02^  4  2  ('^Ij  5  impose  that  this  line 

contains  the  origin.  In  other  words,  ri,r2  and  ra  can 
be  parametrated  as: 


and  Gy  are  normalized,  a,  6,  c  and  A  should  equivalently 
minimize  the  correlation  coefficent: 

P  —  J  Ga:(a,6,c,  A,r)  •Gy(a,6,c,  A,  r)dr  (12) 

Consequently,  given  two  types  of  modulation  x  and 
y,  the  first  step  is  to  use  the  theoretical  expressions 
of  the  cyclic  fourth-  and  second-order  correlations  (5) 
and  (6)  in  order  to  compute  (10)  and  then  determine 
(a,  6,  c,  X)opt  such  as: 

(a,6,c,A)opt  =  argmin(p)  (13) 

To  achieve  this  step,  it  is  of  course  necessary  to  know 
a  priori  the  symbol  duration  T5  and  the  pulse  function 
qit). 


3.3.  Comparison  between  theoretical  and 
estimated  features 

Following  [3] ,  the  two  theoretical  expressions 
G^((a,6,c,  A)^pj  ,r)  and  Gy((a,  6,  c,  A)^p^ ,  r)  are  com¬ 
pared  to  ^^((a,  c,  A)^^^ ,  r)  (which  is  the  feature  es¬ 
timated  on  the  received  signal  s{t))  by  a  matched  fil¬ 
ter  technique  as  described  in  figure  1.  The  outputs  of 
each  of  the  two  matched  filters  are  compared,  and  the 
strongest  correlation  measure  provides  the  recognized 
modulation. 


Ti  =  ar,  r2  =  6r,  T3  =  cr.  (9) 


Besides,  we  impose  that  the  correlation  cj  2,1  (^4) 
a  function  of  the  running  parameter  r  defined  in  (9), 
ans  so  the  resulting  feature  is  given  by: 


Gx{q>i  6,  c,  A,  r)  — 


9x  (®) 

\/fr9U(i>b,C,\,T)dT 


(10) 


Figure  1.  The  x  vs.  y  classification  system 


4.  Application  to  4-PSK  vs.  16-QAM 
classification 

4.1.  Computation  of  optimal  parameters. 


where 

fl(i:(a,6,c,A,  r)  =  fr{aT,bT,cT,T,  X)  (11) 

=  K4,2  cr)|  +  A  |c"  2.1  (r) |)^ 

Now,  the  four  parameters  a,  6,  c  and  A  should  ideally 
maximize  the  distance  between  Gx  and  Gy.  Since  Gx 


We  suppose  now  x  =4-PSK  and  y  =  16-QAM,  The 
correlation  coefficient  (12)  is  easily  computable  using 
(10),  (5)  et  (6)  with: 

4-PSk(  16-QAM  f  ^*,4,2  =  -0.68 

(14) 
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We  also  suppose  that  the  pulse  function  of  the  mod¬ 
ulations  is  given  by:  q{t)  :=  1  if  ^  G  and  q{i)  =  0 

elsewhere. 

The  minimization  of  (12)  is  then  performed  thanks 
to  the  SIMPLEX  algorithm,  which  leads  to  the  follow¬ 
ing  optimal  parameters: 

(a,  6,  c,  X)opt  =  (0.04, 0.99, 0.84,  -2.87)  (15) 

and  the  corresponding  correlation  coefficient  is  given 
by: 

Pmin  =  -0.06.  (16) 

The  optimal  values  for  a,  b  and  c  given  in  (15)  are  not 
suitable  for  discrete-time  series,  since  estimation  can  be 
performed  only  for  integer  lags.  Consequently,  we  are 
compelled  to  use  sub-optimal  parameters,  which  are 
the  interger  values  of  a,  6  and  c  nearest  to  the  optimal 
ones,  i.e.  (a,6,c)  =  (0,1,1).  Once  these  parameters 
are  fixed,  minimization  of  (12)  is  performed  again  with 
respect  to  A  (c/.  figure  2),  which  leads  to: 

Xopt  =  -2.94  and  p^in  =  0.038.  (17) 


Figure  2.  Correlation  coefficient 

The  corresponding  discriminating  features  (10)  are 
shown  on  figure  3. 

4.2.  Simulations  and  discussion 

Simulations  have  been  performed  on  synthetic  data 
in  white  gaussian  noise  for  different  signal  to  noise  ra¬ 
tios  S/N  (S/N=0  ,  5  dB).  The  number  of  transmit¬ 
ted  symbols  Ns  varies  from  50  to  5000  symbols,  with 
Tb  =  10  {i.e.  500  to  5000  samples).  For  each  couple 
(S/N,  Ns),  500  different  signals  (different  symbol  se¬ 
quences  and  noise  samples)  are  generated  for  each  mod¬ 
ulation.  The  figures  (4-5)  give  the  performance  (prob¬ 
ability  of  correct  classification  (Pec)  in  %)  obtained  for 
different  Ng  and  for  a  given  S/N. 


Figures.  Discriminating  features  (-):  4-PSK,  (- 
-):  16-QAM 


On  the  same  figures,  we  reported  the  performances 
of  a  classifier  build  on  exactly  the  same  principles  dis¬ 
cussed  in  section  3,  but  where  the  discriminating  fea¬ 
tures  (10)  are  based  on  stationary  fourth-  and  second- 
order  moment-based  correlation,  i.e.: 

g~{a,  b,  c,  A,  r)  =  2  ^  {^7  2  1  (^)) 

(18) 

where  x{t)  is  a  stationarized  version  of  the  cyclosta¬ 
tionary  process  x{t).  Details  about  this  classifier  can 
be  found  in  [6],  but  it  should  be  stressed  that  the  min¬ 
imum  correlation  coefficient  is: 

Pmin  =  0.54  (19) 

This  correlation  coefficient  is  much  higher  than  the 
one  exhibited  in  (17),  which  means  that  the  patterns 
extracted  in  the  cyclic  domain  are  theoretically  more 
discriminating  than  the  ones  obtained  with  a  station¬ 
ary  model.  However,  figures  (4)  and  (5)  show  that  sim¬ 
ulated  performances  are  better  if  a  stationary  model  is 
adopted.  This  can  be  explained  by  the  following  rea¬ 
sons.  The  performance  of  the  classifier  depends  not 
only  on  the  distance  between  the  discriminating  fea¬ 
tures,  but  also  on  the  variance  of  the  estimates  of  these 
features.  It  is  well  known  that  the  variance  of  the  esti¬ 
mates  is  higher  in  the  cyclic  domain  than  in  the  station¬ 
ary  one,  and  taking  the  modulus  of  the  cumulants  in 
(7)  renders  the  cyclic  classifier  more  sensitive  to  noise 
(for  both  variance  estimation  and  thermic  noise) . 

One  can  remark  that  the  previous  performance  are 
immuned  to  white  noise  because  r  =  0  is  avoided  in 
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the  stationary  case.  But  in  case  of  coulored  noise,  one 
can  guess  that  the  cyclic  approach  will  perform  better. 


Figure  4.  Performance  of  the  classifier, 
S/N=0dB,  (-):  cyclic  model  (-  ~):  station¬ 
ary  model 


Number  of  symbols 

Figure  5.  Performance  of  the  classifier, 
S/N=5dB,  (~):  cyclic  model  (-  -):  station¬ 
ary  model 


on  the  model  adopted  for  the  stochastic  process  (sta¬ 
tionary  or  cyclostationary).  But  it  has  been  proved 
that  the  accuracy  of  the  classification  is  also  highly 
conditioned  by  the  variance  of  the  estimated  features. 
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5.  Conclusion 

We  have  presented  a  new  principle  of  modulation 
classifier,  which  is  derived  from  maximum  likelihood 
techniques.  It  is  new  relatively  to  the  kind  of  the  fea¬ 
ture  that  is  processed  to  achieve  the  classification.  This 
feature  is  an  ad  hoc  function  which  has  to  be  built  for 
each  class  we  want  to  discriminate.  Properly  mixing 
fourth-  and  second-  order  statistics  can  result  in  fea¬ 
tures  that  are  more  or  less  discriminating,  depending 
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Abstract 

A  sub-optimal  nonlinear  time-recursive  filter  is  developed 
which  considers  an  arbitrary  number  of  moments  of  the  con¬ 
ditional  density.  The  filter  assumes  a  quadratic  truncation 
of  the  system  dynamics  and  measurement  functions  and  re¬ 
tains  N  moments,  thus  requiring  knowledge  of  up  to  2N  +  2 
a  priori  moments  and  AT  -f  1  moments  of  the  measurement 
noise  process,  which  may  be  non-Gaussian.  Prediction  and 
update  relations  are  given  for  moments  of  arbitrary  order, 
along  with  mechanisms  which  facilitate  their  closed  forms. 
Numerical  examples  are  given  for  both  scalar  and  vector 
systems  and  show  promising  results. 


1  Introduction 

It  is  seen,  in  tracking  problems  and  indeed  in  all  stochas¬ 
tic  frameworks,  that  nonlinearity  inherently  breeds  non- 
Gaussianity  and  subsequently  non-negligible  higher  order 
statistics  for  the  conditional  density  of  the  state  estimate. 
For  this  discussion,  a  generalised  sampled  continuous  model 
for  nonlinear  stochastic  filtering  is  considered.  Specifically, 
nonlinear  system  dynamics  and  measurement  functions  are 
considered  along  with  non-Gaussian  disturbances.  The  state 
dynamics  and  measurement  functions  may  be  described 
thus, 

^  =  f{x)  +  gw  (1) 

yk  =  h{xk)  +  Vk  (2) 

where  ^  is  a  scalar  constant  and  w  and  Vk  are  the  process  and 
measurement  disturbances  respectively.  Using  knowledge 
of  f{x)  and  h{xk),  a  general  filter  structure  is  constructed 
as  in  Figure  1.  The  estimate  for  the  state  is  described  by  a 
probability  density  functionpx  (x/fciyA;)  conditioned  on  the 
measurement  set  (54)-  This  conditional  density  is  parame- 
terised  or  approximated  such  that  relations  for  the  prediction 
and  correction  of  the  state  estimate  may  be  formulated. 


w(t) 


Figure  1.  Filter  Block  Diagram. 


The  simplified  case  where  the  system  and  measurement 
equations  are  linear  or  linearised  and  the  disturbances  are 
Gaussian  has  previously  been  addressed  by  Kalman  [7],  and 
closed  form  expressions  for  the  filter  parameters  have  been 
derived.  Unfortunately,  the  general  solution  to  the  recur¬ 
sive  filtering  problem  for  non-linear/non-Gaussian  systems 
is  itself  ill-posed,  with  either  parameterisations  or  computa¬ 
tions  of  the  posterior  conditional  probabilities  approaching 
infinite  orders. 

What  is  required  is  a  set  of  recursive  closed-form  re¬ 
lationships  upon  a  fixed  and  finite  number  of  parameters. 
Subsequently,  certain  assumptions  are  made  in  order  to  fa¬ 
cilitate  a  tractable  solution,  at  the  cost  of  compromising  the 
optimality  of  the  estimates.  These  sub-optimal  filters  range 
widely  in  the  fundamental  assumptions  and  structures  that 
are  imposed  [1,  3,  5,  6,  9,  10]. 

Generally,  these  methods  suffer  either  from  large  param¬ 
eter  set  requirements  or  considerable  analytical  complexity 
when  considering  vector  systems.  As  a  consequence,  the 
literature  regarding  these  more  involved  nonlinear  filtering 
methods  consider  only  simple  examples  with  scalar  systems 
and  measurements  [1,  5,  6],  A  more  recent  and  notable  ex¬ 
ception  is  [2]  in  which  examples  containing  vector  systems 
and  scalar  measurement  functions  are  investigated. 

It  remains  to  devise  a  method  by  which  the  filter  param- 
eterisation  and  computational  requirements  may  be  kept  at 
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a  reasonable  level  when  applied  to  practical  problems  with 
state  and  measurement  vectors  of  non-trivial  dimension. 


2  Higher  order  moment  based  filtering 


The  approach  proposed  by  the  authors  embodies  the  ex¬ 
tension  of  the  truncated  and  Gaussian  second-order  filters 
as  developed  by  Jazwinski  [6]  which  considers  the  effect  of 
higher  order  moments  on  the  filtering  solution. 

For  these  second-order  filters,  a  Gaussian  assumption 
is  assumed  for  the  conditional  density  and  noise  densities, 
while  the  system  dynamics  and  measurement  functions  are 
approximated  by  a  Taylor  series  expansion  which  retains 
up  to  the  quadratic  terms.  These  quadratic  nonlinearities 
introduce  terms  in  the  estimate  and  error  covariance  update 
relations  which  include  higher-order  moments  of  the  condi¬ 
tional  density  and  the  noise  processes. 

Facilitation  of  closed  form  relations  requires  that  certain 
closure  methods  be  employed  which  expand  or  extrapolate 
the  required  higher-order  moments  in  terms  of  the  error 
covariance.  Specifically,  the  truncated  second-order  filter 
assumes  that  the  higher-order  moments  are  negligible  and 
may  be  set  to  zero,  while  assuming  a  Gaussian  extrapolation 
yields  the  Gaussian  second-order  filter.  Jazwinski  notes  that 
these  filters  frequently  produce  negative  error  variances  due 
to  the  nature  of  the  assumptions  imposed. 

Alternatively,  consider  the  relaxation  of  the  Gaussian- 
ity  assumption  through  the  inclusion  of  update  relations  for 
moments  of  up  to  some  arbitrary  order  (N).  The  quadratic 
truncation  of  the  dynamics  and  measurement  functions  is 
maintained,  as  is  the  assumption  of  Gaussianity  for  the  pro¬ 
cess  noise  (w).  Non-Gaussian  measurement  disturbances 
are  inherently  included  in  this  filter  structure,  by  consider¬ 
ing  the  moments  of  the  noise  process  (Vk). 

The  filter  structure  may  be  sectioned  into  2  parts;  the  pre¬ 
diction,  in  which  the  evolution  or  diffusion  of  the  moments 
between  observations  is  considered,  and  the  correction  or 
measurement  update.  In  order  to  maintain  brevity,  the  for¬ 
mulation  for  the  scalar  systems  will  discussed  herein,  with 
the  vector  formulations  being  treated  briefly  [11]. 


3  Moment  Prediction 


The  diffusion  relations  for  the  i  th  central  moment  (mj) 
may  be  formulated  by  considering  the  derivative  of  polyno¬ 
mial  expansion  of  its  fundamental  definition.  Proceeding, 

|(’"‘)  =  SE[(*-EH)‘]  (3) 


which  gives. 


J=1 


dt 


d 


-(E[x-^])E[x]+iE[x«-^]J(E[x]) 


(4) 


where  Cij  is  the  coefficient  for  the  j  th  term  of  an  i  th  order 
polynomial.  Evaluating  the  remaining  quantities  using  [8, 
Eqn.  (3.6)]  and  [6,  Lemma  6.1]  respectively  gives. 


k 


1=0 


(5) 


and, 

d 


J{E[x'=])  =  fc|^c,_i,E[x]' 

x(/m(fc_,_i)  +  fxm(k-i)  +  ^fxxm(^k-i+i))^ 

^  k-2 

+  -fe(fc  -  l)g^qY^Ck-2,im^k-i-2)^x]‘ ,  (6) 


where  q  is  the  variance  of  the  process  noise.  Also  /,  fx 
and  fxx  are  the  dynamics  function  and  its  first  and  second 
partial  derivatives  respectively,  evaluated  at  the  current  state 
estimate.  From  these  relations  it  is  evident  that  in  order 
to  predict  N  moments  one  requires  knowledge  of  (iV  -{- 1) 
moments.  Hence,  closure  methods  are  employed  in  order  to 
calculate  the  (A'  -h  l)th  moment. 


4  Moment  Correction 


One  may  calculate  the  posterior  or  corrected  moments 
(m+)  by  considering  the  first  order  Power  Series  expansion 
about  the  measurement  residual  (y  —  E[y]), 

E[x]+  =  ai +6i(2/-E[y])  (7) 

TTit  =  Oi  +  bi(?/ -  E[g])  (8) 

where  the  coefficients  are  expressed  as  a  polynomial  expan¬ 
sion  in  terms  of  the  predicted  moments  (m“),  the  moments 
(r)  of  the  measurement  noise  and  the  first  and  second  partial 
derivatives  of  the  measurement  function. 

Utilising  [6,  Eqn.  9.21],  expressions  for  ai  and  bi  are 
evaluated  thus. 


E[x] 

(9) 

E[(x-E[x])(2,-E[j,])] 

xE[(j/-E[2/])2]-i 

(10) 
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Also,  ttj  and  h  are  expressed  as, 

ai  =  i^^-iy^iAbiy 

3=0 

X  E[(a:  -  (y  -  E[j/])^]  (11) 

bi  -  'ti-'^y^Abiy 

j-O 

xE[(a:-E[a;])(‘-^)(y-E[y])(^+i>] 
xE[(y-E[y])^]"^  (12) 

with  all  of  the  expectations  conditioned  on  the  set  of  previous 
measurements.  The  remaining  expectations  are  of  the  form, 

E[(a;-E[x])'=(y-E[y])']  = 

I  m  j 

m=0  n=0 

n 

X  '^,{~^y^‘^<P^(k+l-m+2n-2v)  (^2  Y 

p=0 

From  these  relations  it  is  seen  that  in  order  to  compute 
the  ith  posterior  moment  one  requires  (2i  +  2)  a  priori 
moments.  This  is  by  far  a  more  serious  problem  than  that 
for  the  case  of  moment  prediction. 

5  Closure  Methods 

It  remains  to  derive  a  method  by  which  higher-order  mo¬ 
ments  may  be  generated  in  terms  of  lower-order  moments. 
Consider,  as  an  alternative  to  the  truncated  and  Gaussian 
closure,  a  method  which  utilises  a  cumulant  based  expan¬ 
sion.  Specifically,  if  N  moments  are  considered  by  the  filter, 
the  moments  of  order  greater  than  N  are  calculated  by  set¬ 
ting  the  cumulants  of  order  AT  -1- 1  and  greater  to  zero.  For 
example,  consider  the  expansion  of  the  5th  cumulant  (ks) 
in  terms  of  the  central  moments  [8]. 

K5  =m5  —  10m3m2 

Setting  K5  =  0  and  solving  yields, 

ms  =  10m3m2 

This  process  is  generalised  such  that  the  required  higher- 
order  moments  may  be  calculated. 

6  Vector  Systems 

For  vector  systems,  a  novel  tensor  notation  is  utilised  such 
that  the  symmetric  nature  of  the  moments  is  considered  [4], 


Filter  Estimates 


Figure  2.  Example  1:  A  typical  realisation. 


Figure  3.  Example  1 :  MSE  for  500  Monte  Carlo 
simulations. 


and  the  quantities  are  stored  in  a  vector  form.  For  example, 
consider  the  third  moment  of  a  2-state  system  as, 

P3  =  [E[x?]  3E[xfx2]  ZE[xixl]  E[xi]]’’.  (14) 

The  algebra  surrounding  the  notation  in  Equation  (14)  con¬ 
tains  a  number  of  useful  properties.  One  property  which  is  of 
particular  value  is  that  of  commutativity,  such  that  the  poly¬ 
nomial  expansions  present  in  the  derivation  for  the  scalar 
system  are  also  valid  for  the  vector  case.  Additionally,  all 
quantities  manipulated  by  the  filter  are  standard  vectors  and 
matrices,  thus  avoiding  complicated  tensor  notations  and 
issues. 

7  Numerical  Analysis 

To  illustrate  the  effectiveness  of  the  higher  order  moment 
techniques  three  numerical  examples  are  investigated. 
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Figure  4.  Example  2:  Estimates  for  a  typical 
realisation. 


Figure  5.  Example  2:  Errors  for  a  typical  real¬ 
isation. 


7.1  Example  1 

Consider  firstly  a  scalar  example  in  which  estimation  of  a 
constant  state  from  noisy  measurements  of  its  squared  value 
is  investigated,  such  that, 

^(a;)  =  0  and  +  Vk^  (15) 

An  experiment  is  constructed  for  which  the  true  state  xq  =  3 
and  the  standard  deviation  of  the  measurement  noise  ay  = 
0.1.  The  initial  filter  estimate  is  chosen  xq  ~  A/'(3, 4)  such 
that  the  nonlinearity  effects  are  accentuated. 

Figure  2  shows  a  typical  realisation  of  the  estimate  for 
the  proposed  method,  for  AT  =  2  and  the  Extended  Kalman 
Filter  (EKF).  The  EKF  clearly  shows  its  characteristically  bi¬ 
ased  convergence  while  a  marked  improvement  is  achieved 
for  the  proposed  method.  The  improvement  is  also  illus¬ 
trated  by  the  Monte-Carlo  analysis  results  for  500  realisa¬ 


Figure  6.  Example  2:  Realisation  with  biased 
estimates. 


Figure  7.  Example  2:  MSE  for  500  Monte  Carlo 
simulations. 

tions  in  given  Figure  3.  Results  for  4  and  6  moments  offer 
little  improvement  over  the  2  moment  filter  for  this  case. 

7.2  Example  2 

Consider  a  more  interesting  scalar  system,  in  which  the 
both  the  dynamics  and  the  measurement  functions  contain 
higher  than  quadratic  nonlinearities.  Namely, 

dx 

=  -cos®  (ox) +  6  (16) 

Vk  =  xl  +  Vk  (17) 

where  a  =  7  and  6  =  1.5.  The  initial  estimate  is  ~  A/^(5, 1) 
and  ay  =  2.  Figures  4  and  5  show  the  respective  estimates 
and  errors  for  a  typical  realisation  {N  =  6).  While  the 
improvement  is  marginal  for  this  case,  the  bias  effects  as 
shown  in  Figure  6  are  largely  avoided.  Monte  Carlo  analysis 
in  Figure  7  shows  that  a  suitable  number  of  moments  is  4, 
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X  Estimates 


Figure  8.  Example  3:  Estimates  for  x. 


as  the  6  moment  filter  offers  negligible  improvement.  The 
oscillations  in  the  MSE  for  the  4  and  6  moment  filters  are 
attributable  to  the  neglected  higher-order  nonlinearities  in 
the  system  dynamics, 

7.3  Example  3 

Consider  a  simple  2  dimensional  vector  system  with  a 
constant  state, 

X  —  [x  yY  (18) 

where  range  and  bearing  angle  measurements  available  for 
filtering, 


V^l  +  y'k 

-j- 

’  Vu  ' 

tan  ^ivklxk)  . 

.  ^2.  . 

with  standard  deviations  for  the  range  and  bearing  measure¬ 
ment  noises  being  30  metres  and  0.015  radians  respectively. 
The  initial  estimate  is  chosen  such  that  Xo  ^  A/'([3  3]^,  I). 
Figures  8  and  9  show  some  preliminary  results  for  a  4  mo¬ 
ment  filter  for  vector  systems.  These  results  show  improved 


convergence  over  the  EKF  in  the  estimates  for  both  states 
and  motivates  continuing  research  into  vector  higher-order 
filters. 

8  Conclusion 

A  sub-optimal  nonlinear  time-recursive  filter  was  devel¬ 
oped  which  considers  an  arbitrary  number  of  moments  of 
the  conditional  density  for  the  filter  estimate.  It  was  seen 
that  under  a  quadratic  truncation  of  the  system  dynamics 
and  measurement  functions  that  a  filter  which  retains  N 
moments  requires  knowledge  of  up  to  2N  -\-2a  priori  mo¬ 
ments  and  iV  -f- 1  moments  of  the  measurement  noise  process. 
Prediction  and  update  relations  were  given  for  moments  of 
arbitrary  order,  along  with  mechanisms  which  facilitate  their 
closed  forms.  Numerical  examples  are  given  for  both  scalar 
and  vector  systems  and  show  promising  results. 
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Abstract 

In  this  paper,  two  isolated  word  recognition  method 
based  on  high-order  statistics  and  time-delay  neural  net¬ 
work  (TDNN)  for  recognition  of  Farsi  spoken  digits  have 
been  studied.  The  adopted  speech  recognition  system 
consists  of  four  modules,  namely,  preprocessor,  end¬ 
points'  detector,  feature  extractor  and  classifier.  The  first 
method  estimates  the  AR  parameters  of  speech  based  on 
the  third-  and  fourth-order  cumulants  using  high-order 
Yule-Walker,  W-slice  and  1-D  slice  approaches.  In  the 
second  method,  statistical  features  are  extracted  from  the 
estimated  high-order  probability  density  function  (pdf  of 
theresholded  amplitude  features.  For  each  pdf  estimate, 
the  values  of  mean,  variance,  third  order  moment  and 
entropy  are  computed.  The  total  number  of  features  for 
each  frame  of  approximate  length  of  15  ms  is  16..  The 
adopted  TDNN  has  16  nodes  in  its  input  layer,  10  nodes 
in  its  output  layer  and  two  hidden  layers.  The  learning 
rule  of  the  adopted  TDNN  that  is  based  on  the  back- 
propagation  rule  has  been  modified  to  decrease  the 
training  time.  Computer  simulation  results  obtained 
from  recognizing  10  Farsi  digits  spoken  by  different 
speakers  shows  that  the  first  method  has  a  better  recog¬ 
nition  rate  while  the  second  method  necessitate  less  com¬ 
putation 


1.  Introduction 

Speech  is  a  forai  of  measures  that  human  beings  use  to 
communicate  intention  and  emotion.  Therefore  speech 
signal  processing  techniques  including  automatic  speech 
recognition  has  attracted  attention  as  practical  application 
areas  widen.  As  classical  methods.  Dynamic  Time  Warp¬ 
ing  (DTW)  and  Hidden  Markov  Model  (HMM) 
have  been  used  for  speech  recognition.  In  recent  years, 


automatic  speech  recognition  using  Artificial  Neural 
Networks  (ANN)  have  been  studied  by  many  researchers. 
A  number  of  recognition  systems  have  employed  original 
or  modified  version  of  the  Time  Delay  Neural  Network 
(TDNN)  as  defined  by  Waibel,  and  some  of  these  meth¬ 
ods  outperform  the  conventional  HMM  approach  [1],  [2]. 

Classical  speech  analysis  techniques  for  feature  ex¬ 
traction  are  based  on  second-order  statistics  and  their 
performance  dramatically  decreases  when  noise  is  present 
in  the  signal  imder  analysis. 

In  the  last  few  years  there  has  been  an  increasing 
interest  in  the  application  of  High-Order  Statistics  (HOS) 
in  speech  recognition.  Paliwal  [3]  made  use  of  HOS  in  a 
recognition  system  and  he  showed  that  results  remain 
constant  under  a  great  variability  of  SNR.  Results  show 
that  speech  signals  can  be  characterized  not  only  by  its 
autocorrelation  but  also  by  its  third-  and  fourth-order 
cumulants.  Cumulants  of  order  greater  than  two  are  zero 
for  white  and  colored  Gaussian  noise.  Analysis  of  noisy 
speech  signals  based  on  HOS  permits  separation  of 
speech  from  noise. 

In  this  paper  two  HOS  based  methods  for  Farsi  iso¬ 
lated  digit  recognition  has  been  studied.  The  first  method 
adopts  the  estimation  of  AR  parameters  using  HOS  tech¬ 
niques.  The  second  method  employs  features  based  on  the 
estimates  of  high  order  pdf. 

2.  AR  Parameters  Estimation  Using  HOS 

From  the  high-order  Yule- Walker  algorithm  [5]  a  set 
of  following  equations  is  obtained; 

p 

Z^O)Ck,y(m-l,ko,0,..0)  =  0  (1) 

1=0 
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To  solve  these  equations,  it  is  necessaxy  to  concatenate 

p+1  slices,  ko=-p . ,0  in  (1)  because  a  single  slice  does  p 

not  guarantee  a  full  rank  system  of  equations.  To  im- 

prove  the  stability,  two  modification  has  been  proposed:  i=o 

to  increase  the  number  of  slices  or  to  increase  the  number 

of  equations  per  slice  [4],  n  i.  i  n  rw - 

The  w-slice  algorithm  [6]  is  based  on  the  following  where  I,  (1,1  ,Ko,U... Un¬ 
weighted  sum  of  cumulant  slices; 


(7) 


(8) 


N 

C„(i)  =  W2C2,y(i)+  2w3  0)C3y(i,j) 

j=-L 

+Z  Z'^4(),k)C4y(i,j,k)+ . 

j=-lk=-L 


and  it  is  developed  in  three  steps: 
a),  choose  W2 ,  W3  (j)and  W4(i  j)  so  that: 


Cw(i)=  0,  i=-P, . ,-l 

C„(0)=1 


(3) 


where  P>p,  N>0  and  L>p+M,  where  Mis  the  over 
determination,  b).  Estimate  the  first  P  term  of  the  impulse 
response  from  the  weighted  cumulant  Cw(i). 


h(i)  =  Cw(i)  i=l,...P  (4) 

c)  Solve  the  filter  coefScients  from  the  following  equa¬ 
tion: 


2C,.y(m-l,ko,0...0)C,,,(m-l',ko,0...0) 

m>0 


Instead  of  fc(.)  we  approximately  use  Ro(.)  that  is  the 
autocorrelation  of  the  causal  part  of  the  one  slice  cumu¬ 
lant  which  yields; 

2afl)R.0-l’)  =  0  1-1 . P  («) 

1=0 

The  coefficients  obtained  from  (9)  differ  from  the  true  AR 
coefficients  but  they  are  very  useful  for  speech  recogni¬ 
tion  tasks.  The  autocorrelation  method  that  assures  a  sta¬ 
ble  solution  also  has  been  considered; 


2a(k)RH.(k-l)  =  0  k=l.p  (10) 

k=0 


3.  HOS  Based  Feature  Estimation  using  PDF 


^a(l)h(i-l)  =  0  /=l...P  (5) 

1=0 

In  the.  1-D  slice  algorithm,  the  AR  parameters  are 
obtained  from  the  single  cumulant  slice  Ck,y(m,ko,...,0). 
An  one-dimensional  slice  does  not  guarantee  a  full  rank 
system  of  equations  and  for  this  reason  is  not  reasonable 
to  solve  (1)  directly  since  the  solution  may  not  be  stable 
and  unique.  Instead  the  autocorrelation  of  the  cumulant 
slices  considered  [4].  If  we  multiply  (1)  by  the  one¬ 
dimensional  slice  C  k,y(m-r,ko  ,0...0)  and  we  sum  in  an 
interval  with  m>0,  we  obtain: 

^a(l)XCk,y(m-Uo,0..0)Cky(m-l',ko,0..0)  =  0 

1=0  m>0 

(6) 

The  above  equation  can  be  expressed  as: 


Let  X  be  a  feature  vector  andletxi,  X2,...  Xdbethe 
values  that  specific  features  take.  If  each  element  x;  of 
the  vector  X  is  quantized  and  assumes  a  finite  number  of 
different  values.  For  example  if  the  feature  used  is  an 
amplitude  feature  with  8  bit  resolution  (256  levels),  then 
each  Xi  can  be  represented  as  a  vector  itself  as  follows; 

Xi=(Xi,,Xi2, . ,XJ  (11) 

As  X  is  a  multidimensional  vector,  there  are  several 
means,  variances  and  moments  (first-,  second-  and  third 
order  moments).  For  example  we  can  define  several 
means  as  follow; 

j=t  lk=t2I=t  dn=t 

M.,  (12) 

j=l  lk=121=l  dn=l 

One  problem  with  the  implementation  of  this  model  is 
the  high-dimensionality  of  the  estimate  which  require  an 
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impractical  amoimt  of  memory[8].  To  solve  this  problem, 
we  build  several  binary  feature  vector  from  our  feature 
vector.  In  our  case  we  generate  four  binary  vector  using 
four  equally  spaced  theresholds. 

Let  f(x)  be  the  amplitude  of  sample  x  and  let  T  be  the 
value  of  the  threshold.  The  binary  vectors  g(x)  are  gen¬ 
erated  by  applying  the  condition; 


l,if..f(x)>T 

0,if..f(x)<T 


(13) 


Each  of  these  binary  vector  form  a  d  digit  binary  number 
from  which  we  can  compute  its  histogram.  From  each 
histogram  the  mean,  variance,  third  order  moment  and 
entropy  are  computed. 

The  values  calculated  are  used  as  descriptors  or  fea¬ 
tures  of  a  specific  word  class.  For  classification  we  used 
neural  networks  ( we  used  MLP  and  TDNN)-  The  neural 
networks  are  trained  with  these  descriptors.  From  the 
histogram  the  mean,  variance,  third  order  moment  and 
entropy  are  computed. 

4.  Implementation 

To  evaluate  the  performance  of  the  two  methods  de¬ 
scribed  earlier,  a  database  composed  of  utterances  of  10 
speaker  was  generated.  Each  speaker  uttered  10  repetition 
of  each  Farsi  digit.  Five  utterance  of  all  the  speaker  are 
used  for  training  and  five  for  testing.  Noisy  signals  are 
obtained  adding  white  Gaussian  noise  at  SNR  of  0,  10, 
20  dB. 

For  the  evaluation  of  the  first  method  in  recognition 
of  Farsi  spoken  digits,  the  autocorrelation  algorithm  (or 
second  order  Yule-Walker  method  YW2),  the  third-  and 
fourth-order  Yule-Walker  algorithm  (YW3  andYW4), 
the  W-slice  algorithm(WS3  and  WS4)  and  the  1-D  slice 
algorithm(DS3  and  DS4)were  examined.  16  features  that 
are  cepstrum,  delta  cepstrum  and  delta  energy  are  used  in 
the  recognition  system. 

The  TDNN  architecture  consists  of  2  hidden  layer 
plus  input  and  output  layers.  The  input  layer  has  16  nodes 
with  4  delays  that  makes  the  total  number  of  input  nodes 
to  be  16*5.  The  first  hidden  layer  has  10  nodes  with  6 
delays.  The  second  hidden  layer  has  10  nodes  with  10 
delays  and  the  output  has  10  nodes  with  no  delay.  The 
configuration  of  the  adopted  TDNN  is  shown  in  Fig.  1. 

The  speed  of  the  training  of  the  TDNN  has  been 
significantly  increased  by  modification  of  its  learning 
algorithm.  This  is  achieved  by  adding  a  linear  function  to 
the  sigmoid  transfer  function  of  neurons.  This  prevents 
the  derivative  of  the  sigmoid  transfer  function  become 
zero.  Otherwise  in  situations  when  the  derivative  of  the 


transfer  function  is  zero  or  very  small,  the  weights  will 
not  be  updated  and  the  learning  time  will  be  increased. 

By  considering  the  sigmoid  transfer  function; 

fi:x) = —1^  (14) 

1-i-e 

When  a  linear  function  is  added  to  the  sigmoid  function 
such  as  ; 


fl(x)  =  f(x)  +  X 

(15) 

fi'(x)=f(x)+l 

(16) 

Thus  in  this  case  the  derivative  of  the  transfer  function 
will  always  be  nonzero. 

The  recognition  rate  of  the  aforementioned  methods 
are  summarized  in  table  1. 


SNR 

clean 

20  dB 

10  dB 

0  dB 

YW2 

97.5 

96.2 

79.3 

33.4 

YW3 

97.3 

95.9 

73.5 

23.4 

YW4 

96.3 

96.1 

75.4 

23.6 

WS3 

96.2 

94.3 

80.7 

44.9 

WS4 

97.2 

95.1 

83.6 

48.8 

DS3 

95.9 

93.3 

87.6 

57.8 

DS4 

96.1 

94.4 

89.9 

63.1 

Table  1.  Summary  of  recognition  rate  of 
Farsi  (digits  recognition  {metho(d  1) 

In  the  second  method  which  exploits  high-order  pdf 
based  features,  each  Farsi  digit  signal ,  after  end-points' 
detection  is  divided  into  equal  ,non  overlapping  frames. 
The  total  number  of  frames  for  each  digit  chosen  to  be 
100.  We  pad  enough  zeroes  at  the  end  of  each  word  so 
that  the  number  of  samples  in  each  frame  is  an  integer. 

In  the  next  step  we  select  d  equally  spaced  samples 
from  each  frame.  In  the  preliminary  investigation  d  se¬ 
lected  to  be  10.  Although  amplitude  feature  were  adopted, 
other  types  such  as  LPC  features  could  be  used.  For  the 
frames  that  their  length  are  not  multiples  of  10,  interpo¬ 
lation  technique  used  to  evaluate  the  values  of  the  appro¬ 
priate  amplitude  features.  Each  obtained  feature  vector  is 
converted  to  4  binary  vectors  using  4  equally  spaced 
threshold  levels.  Each  binary  vector  is  in  fact  a  10  binary 
digit  number  ranging  from  0  to  1024.  The  histogram  of 
each  vector  is  then  calculated  and  from  that  the  first-, 
second-  and  third  order  moments  and  entropy  are  com¬ 
puted. 
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The  moments  are  computed  from  the  following  equa¬ 
tions: 

L 

Z  =  1 


percent  lower.  By  considering  the  fact  that  the  amount  of 
computation  involved  in  the  second  method  is  much 
lower,  this  results  is  very  encouraging.  Some  modifica¬ 
tion  is  going  to  be  implemented  to  improve  the  results  of 
the  second  method. 

5.  Conclusion 


10*1 


10*11 


10*7 


16*5 


Fig.l.  Configuration  of  the  TDNN 


Where  m  is  the  mean  value  of  z  (first-order  moment) 

L 

m  =  ZziP(Zi)  (18) 

i=l 

The  average  information  or  entropy  is: 

H(z)  =  -|;p(a,)logP(a,)  (19) 

i  =  l 

The  total  number  of  features  obtained  for  each  frame 
is  16.  Before  application  of  these  feature  vectors  to  the 
TDNN,  they  were  compressed  by  a  logarithmic  function 
to  limit  their  dynamic  range.  The  employed  speech  data¬ 
base  and  the  TDNN  are  the  same  as  used  for  the  first 
method.The  recognition  rate  of  the  second  method  is  not 
as  good  as  the  rate  of  the  first  method,  and  is  about  ten 


In  this  paper  two  methods  based  on  high-order  statis¬ 
tics  for  recognition  of  Farsi  spoken  digits  have  been 
studied.  The  first  method  estimates  the  AR  parameters  of 
speech  based  on  third-  and  fourth-order  cumulants  using 
Yule-Walker,  W-slice  and  1-D  slice  methods  and  com¬ 
pare  them  with  the  autocorrelation  method.  The  second 
method  is  based  on  high-order  pdf  and  the  first-,  second-, 
third-order  moments  and  entropy.  The  first  method 
showed  better  recognition  rate  while  the  second  method 
needs  much  less  computation.  Farsi  spoken  digits  has 
more  similarity  to  each  other  than  their  English  counter¬ 
parts,  thus  their  recognition  rate  is  lower. 
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Abstract 


We  investigate  in  this  paper  whether  the  HOS-based 
blind  channel  estimation  method  EVI  (Eigenvector  ap¬ 
proach  to  blind  Identification)  can  compete  with  the  non¬ 
blind  cross-correlation-based  scheme  used  in  state-of-the- 
art  GSM  receivers  (Global  System  for  Mobile  comm.).  For 
blind,  non-blind,  and  ideal  estimates  of  COST-207^  mobile 
radio  channels,  we  give  simulated  bit  error  rates  (BER) 
after  Viterbi  detection  in  terms  of  the  mean  signal-to-noise 
ratio  (SNR).  Averaged  over  three  COST-207  propagation 
environments,  EVI  leads  to  an  SNR  loss  of  L2dB  only, 
while  it  saves  the  22%  overhead  in  GSM  data  rate  due 
to  the  transmission  of  training  sequences.  Since  just  142 
samples  are  used  for  channel  estimation,  we  consider  this 
performance  outstanding  for  an  approach  based  on  HOS. 


1.  Linear  time- variant  GSM  channel  model 

Consider’  the  equivalent  baseband  representation  of  a 
GSM  communication  system  in  Figure  1,  where  source 
and  channel  coding  are  omitted  to  enhance  clarity.  In 
the  transmitter,  coded  information  bits  d{k)  together  with 
reference  bits  f(k)  are  assembled  into  bursts  of  142  bits, 
where  a  value  from  {—1, 1}  is  taken  each  bit  period  T  = 
48/13  /xs  ^  3.7  ps.  Each  burst  is  encoded  differentially  to 
facilitate  demodulation.  Then,  it  is  modulated  by  Gaussian 
Minimum  Shift  Keying  (GMSK)  with  /sdB  =  0.3/T  and 
transmitted  over  the  multipath  radio  channel. 

In  a  mobile  scenario,  the  physical  multipath  radio 
channel  is  time-variant  with  a  baseband  impulse  response 
depending  on  the  time  difference  r  between  the  observation 
and  excitation  instants  as  well  as  the  observation  time  t. 
We  adopt  the  stochastic  Gaussian  Stationary  Uncorrelated 

^This  research  is  supported  by  the  German  NSF  (DFG).  Matlab  programs 
and  compressed  postscript  files  of  our  publications  are  also  available 
from  our  WWW  server  http :  /  /www .  comm .  uni-bremen .  de. 


Scattering  (GSUS)  model  leading  to  the  following  impulse 
response  of  the  channel  including  the  receive  filter  [1] 


hc{T,t)  =  -PHcCr-r.),  (1) 

..=1 


where  Ne  is  the  no.  of  elementary  echo  paths,  gRcir) 
denotes  the  receive  filter  impulse  response,  and  the 
subscript  in  hd')  suggests  its  continuous-time  property.  3D 
sample  impulse  responses  can  easily  be  determined  from  (1) 
by  independently  drawing  Ne  Doppler  frequencies  fd^u,  Ffe 
initial  phases  ©j,,  and  Ne  echo  delay  times  from  random 
variables  with  Jakes,  uniform,  and  piecewise  exponential 
probability  density  functions,  respectively.  As  for  the  echo 
delay  times  r^,  we  use  standard  COST-207^  Typical  Urban 
(TU),  Bad  Urban  (BU)  and  Hilly  Terrain  (HT)  profiles. 
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Figure  1:  GSM  communication  system 


According  to  Figure  1,  each  received  burst  is  corrupted 
by  additive  Gaussian  noise  r{t)  which  is  colored  by  the 
5th  order  Butterworth  receive  filter  gRc{T)  with  cut-off 
frequency  at  75  kHz.  We  use  this  filter  throughout  the  paper 
for  reasons  of  adjacent  channel  suppression  (GSM  carrier 
spacing  is  200  kHz).  Upon  symbol-rate  sampling,  a  simple 
derotation  demodulator  can  be  used  to  obtain  y{k). 

Maximum  Likelihood  Sequence  Estimation  (MLSE) 
represents  the  optimum  procedure  to  remove  intersymbol 
interference  from  a  received  digital  communication  signal 
such  as  y{k).  However,  it  assumes  the  “composite  channel”, 
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i.e.  the  equivalent  symbol-rate  system  between  d{k)  and 
y{k)  (see  the  dashed  frame  in  Figure  1),  to  be  (Rl)  linear 
with  finite  impulse  response  which  (R2)  must  be  known. 
Moreover,  (R3)  the  noise  component  of  y{k)  is  supposed  to 
be  white.  Strictly  speaking,  none  of  the  requirements  (Rl) 
to  (R3)  is  met  in  typical  GSM  systems. 

However,  as  for  (Rl ),  it  can  be  shown  that  the  composite 
channel  can  be  approximated  by  the  linear  model 


for  k&K. 
otherwise 


(2) 


with 

5o(T,f)  =  co(t)  *  hc{T,t)  ,  (3) 

where  Co  (r)  represents  the  Laurent  approximation  [2]  of  the 
(non-linear)  GMSK  modulator,  denotes  the  convolution 
operator  and  the  factor  j"''  takes  the  combined  effects 
of  differential  encoding  and  derotation  demodulation  into 
account.  In  order  to  obtain  a  (time-variant)  FIR  model,  let 
h(K,  t)  be  limited  to  the  range  1C  of  “relevant”  indices  k. 


HR-(9(t))  (Viteibi) 

Figure  2:  GSM  system  with  linear  time- variant  channel  model 


Using  the  linear  model  (2)  for  the  composite  channel, 
Figure  1  can  be  redrawn  according  to  Figure  2.  In  the 
sequel,  h(K,  t)  will  simply  be  termed  “channel”.  Note  that 
for  any  fixed  value  to,  the  channel  slice  h{K,  to)  may  be 
mixed-phase.  Let  q(to)  denote  its  effective  order,  which 
may  vary  from  slice  to  slice  due  to  time  selective  fading. 
The  noise  sequence  n{k)  emerges  from  r{t)  in  Figure  1  by 
symbol-rate  sampling  and  demodulation.  Although  n{k)  is 
colored  due  to  the  Butterworth  receive  filter,  this  is  tolerated 
in  typical  GSM  receivers  (c.f.  requirement  (R3)). 

According  to  (R2),  MLSE  requires  the  knowledge  of 
h{K,t)  in  order  to  equalize  a  block  of  the  demodulated 
sequence  y{k).  As  this  knowledge  is  not  available,  the 
problem  of  channel  estimation  arises.  This  is  indicated  in 
Fig.  2  by  a  box  termed  “ch.est.”.  Let  h{K,0  denote  the 
estimate  which  will  be  used  to  equalize  the  ^-th  burst. 

In  the  following  section,  we  focus  on  two  algorithms 
to  calculate  estimates  h{K,0-  They  will  be  compared  in 
section  3  in  terms  of  their  channel  estimation  quality  as  well 
as  the  resulting  bit  error  performance  after  MLSE. 


2.  Blind  and  non-blind  channel  estimation 

Channel  estimation  is  a  particular  form  of  system 
identification.  All  methods  we  apply  in  this  paper  suppose 


the  system  to  be  mixed-phase,  linear  and  to  have  a  finite 
impulse  response.  With  time- variance  being  relatively  slow 
in  both  GSM-900  and  DCS- 1800  applications  = 

(226Hz)“^  «  4.4  ms  >  T),  the  channel  can  also 
be  assumed  piecewise  (quasi)  time-invariant,  i.e.  time- 
invariant  over  a  certain  number  AA:  of  bit  periods  T 


^  h{K,,to)  for  \t-to\<^kT/2.  (4) 

If  at  most  Ak  consecutive  samples  of  y{k)  are  observed 
at  once,  y{k)  can  be  considered  quasi  stationary  so  that 
the  respective  channel  estimation  algorithm  may  assume 
a  mixed-phase  linear  time-invariant  FIR  system 
Generally,  a  system  can  be  identified  with  or  without 
reference  data  (training  sequences). 
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58  Data  bits 
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26  Train,  bits 
f(0)..f(25) 

58  Data  bits 
d(58)  ....  d(115) 

3 

oo 

Tai 

◄ 

Burst  (0.577ms  =  15625  bit  periods) 

Figure  3:  GSM  “normal”  burst 


As  an  example  for  a  channel  estimation  approach  based 
on  training  sequences,  consider  the  cross-correlation  (CC) 
scheme  used  in  state-of-the-art  GSM  receivers.  According 
to  Figure  3,  each  “normal”  burst  contains  a  training 
sequence  f{k)  of  26  bits  surrounded  by  two  packets  of 
58  data  bits  emerging  from  the  information  sequence  d{k). 
On  the  assumption  of  time-invariance  over  Ak  =  21  bit 
periods,  channel  estimates  can  be  derived  from 

the  received  sequence  sampled  at  symbol-rate  by  using 
the  sample  cross-correlation  between  the  demodulated 
(corrupted)  and  the  stored  (ideal)  training  sequences  v{k) 
and  f{k),  respectively  (see  Fig.  2).  For  any  given  order  q  of 
the  FIR  system  to  be  estimated,  the  estimate  is  given  by 


'Ho,  O' 

u(10  —  q) 

=  F 

.  . 

«(25) 

where  F  denotes  the  (^  +  1)  x  (^4-16)  Toeplitz  matrix 
containing  the  orthogonal  part  of  the  training  sequence 


/(20) 

/(5) 


0 

/(20)  . 


(6) 


N.B.:  Although  the  above  channel  estimation  scheme 
supposes  time-mvariance  of  the  channel  over  a  period  of 
A  A:  =  21  training  bits  only,  the  resulting  estimate  is  used 
for  MLSE  on  the  adjacent  data  fields.  As  the  channel 
coefficients  might  already  have  changed  in  the  data  fields, 
there  is  an  implicit  assumption  of  quasi  time-invariance 
over  one  burst  (Ak  =  142)  in  this  concept. 
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The  repeated  transmission  of  training  sequences  leaves 
a  GSM  system  with  an  overhead  capacity  of  26/116  = 
22.4%.  This  capacity  could  be  used  for  other  purposes  if  the 
channel  was  estimated  from  the  received  signal  only  (blind 
system  identification).  As  just  one  symbol-rate  sampled 
received  sequence  is  available  in  GSM  mobile  units,  in  the 
downlink,  this  can  only  be  accomplished  by  exploiting  HOS 
of  the  (quasi)  stationary  demodulated  sequence  y{k). 

Remark:  If  the  sampling  period  was  a  fraction  of  T,  or 
alternatively,  the  symbol-rate  sampled  signals  received  by 
several  antennae  were  interleaved  (in  the  uplink,  e.g.),  the 
resulting  demodulated  sequence  is  (quasi)  cyclostationary. 
Generally,  Second  Order  Cyclostationary  Statistics  (SOCS) 
are  sufficient  to  retrieve  the  channel.  However,  there  are 
“singular”  channel  classes  which  can  not  be  identified  [3]. 
We  have  shown  in  [4]  that  singular  channels  represent 
a  severe  limitation  to  SOCS-based  methods  because 
estimation  performance  from  few  samples  is  heavily 
affected  if  subchannel  zeros  are  just  “close”  to  each  other. 
As  this  can  not  be  prevented  in  a  mobile  environment, 
SOCS-based  algorithms  are  neglected  in  this  paper. 

In  summary,  we  have  used  the  following  criteria  for  the 
selection  of  a  blind  channel  estimation  algorithm: 

(1)  Reliable  channel  estimates  must  be  obtained  from 
142  samples  of  the  demodulated  sequence  y(k)  only. 

(2)  This  should  apply  to  arbitrary  channels  (even  if  there 
are  zeros  on/close  to  the  complex  plane’s  unit  circle). 

(3)  As  the  effective  channel  order  is  unknown  (and  time- 
variant),  an  order  misfit  must  not  represent  a  problem. 

(4)  The  estimates  should  be  as  robust  as  possible  with 
respect  to  stationary  additive  Gaussian  noise. 

We  have  selected  the  EIGENVECTOR  APPROACH  TO 
BLIND  iDENTinCATlON  (EVI)  by  Kammeyer,  Jelonnek, 
and  Boss  [5,  6],  which  is  based  on  4th  (and  2nd)  order 
statistics,  because  recent  simulation  results  suggested  that  it 
meets  the  above  criteria.  For  linear  modulation  with  raised 
cosine  transmit  and  receive  filtering,  we  have  demonstrated 
in  [7]  that  EVI  can  blindly  estimate  COST-207  channels 
from  142  samples  within  a  normalized  mean  square  error 
bound  of  about  5  per  cent  (at  a  constant  SNR  of  7  dB). 
EVFs  estimation  performance  was  also  compared  with 
methods  based  on  SOCS  [4]  and  HOS  [7]:  On  the  above 
conditions,  EVPs  estimation  performance  was  found  to  be 
superior  in  both  cases.  Finally,  for  GSM  data  transmission 
over  COST-207  mobile  channels,  we  have  compared  EVI 
with  the  optimum  non-blind  least  squares  (LS)  scheme  in 
terms  of  the  resulting  bit  error  rate  (BER)  after  MLSE. 
We  have  demonstrated  that  EVI  entails  an  average  SNR 
loss  of  1.1  dB  only  [8].  While  the  channel  used  for  those 
simulations  was  time-mvariant  within  each  burst,  we  apply 
true  time- variant  filtering  in  this  paper  and  investigate  the 
effects  on  channel  estimation  as  well  as  BER  performance. 


3.  Simulation  results 


Fig.  4  shows  the  magnitude  impulse  response  |^o(t,  t)\ 
according  to  (3),  where  hdr,  t)  is  a  sample  COST-207  Hilly 
Terrain  (HT)  channel  obtained  from  (1)  with  Ne  =  100. 
Both  time  axes  are  normalized  to  the  GSM  bit  period  T  w 
3.7  fis.  The  max.  Doppler  frequency  is  set  to  fd,max  = 
200  Hz.  For  GSM-900  and  DCS- 1800,  this  corresponds  to 
velocities  of  the  mobile  unit  in  the  ranges  225 . . .  243  km/h 
and  115 .. .  126  km/h,  respectively.  For  Fig.  4,  eq.  (3)  is 
evaluated  over  a  t  range  covering  3  min.  Doppler  periods 
Td,min  =  ^/fd,max  =  5  ms,  i.e.  15  ms,  or  equivalently, 
4062  bit  periods  or  26  burst  periods.  The  surface  lines  are 
obtained  by  sampling  \go{T,  t)|  four  times  each  bit  period  on 
the  T  axis  and  once  each  burst  period  (156.25  T,  c.f.  Fig.  3) 
on  the  t  axis.  Note  that  the  magnitude  impulse  response 
\h{K,t)\  of  the  linear  model  (2),  which  is  the  time- variant 
FIR  system  to  be  estimated,  can  also  be  seen  at  r  =  /cT. 


Fig.  4:  Linear  model  of  a  sample  Hilly  Terrain  composite  channel 


For  the  simulation  of  a  GSM  data  transmission  link 
according  to  Fig.  1,  hc{r,  t)  is  required  rather  than  goir^t). 
However,  \hc{r^t)\  looks  pretty  much  like  Fig.  4  because 
of  the  bell-shaped  form  of  the  Laurent  impulse  co(r).  The 
“hills”  are  just  slightly  narrower  (in  direction  of  r). 

In  order  to  ensure  (i)  meaningful  measurements  of  BER 
and  (ii)  a  satisfactory  approximation  of  a  GSUS  channel 
by  a  sample  impulse  response  hc{r^t),  we  use  a  range  of 
t  for  the  simulations  which  is  much  wider  than  the  one 
displayed  in  Fig.  4.  Let  denote  the  ^-th  burst  containing 
156  bits  with  values  from  {-1, 1}.  For  the  results  given 
below,  a  total  of  2000  bursts  Do,  •  •  • ,  D1999  is  transmitted, 
i.e.  232000  information  bits.  Since  at  most  each  4th  burst  is 
sent  to/from  the  same  mobile  station,  this  covers  a  t  range 
of  8000  burst  periods,  i.e.  4.6s  or  924  min.  Doppler  periods. 
To  take  the  channel’s  time- variance  within  each  burst  period 
into  account,  hdr^t)  is  sampled  seven  times  per  burst  at 
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a)  Bad  Urban  (BU)  channel  (L  =  4)  b)  Typical  Urban  (TU)  channel  (L  =  3)  c)  Hilly  Terrain  (HT)  channel  (L  =  6) 


Figure  5:  NMSE  (above)  and  BER  (below)  in  terms  of  SNR  for  the  estimates  of  three  sample  COST-207  GSUS  channels 

solid:  blind  EVI  channel  estimation,  dashed:  non-blind  CC  channel  estimation,  dotted:  “ideal”  non-blind  channel  estimation 


=  (4  •  156.25  r)^+ (26  r)/x  with /i  =  0,-“,6and 
then  interpolated  linearly. 

For  the  following  description  of  the  simulation  proce¬ 
dure,  assume  that  a  sample  impulse  response  t)  and  a 
value  of  the  mean  signal-to-noise  ratio  SNR  was  selected. 
Referring  to  Fig.  1,  the  burst  is  encoded,  modulated,  and 
then  propagated  through  the  time- variant  channel  hc{r,  f), 
where  ^  <  ^,6-  Gaussian  noise  r(f),  colored  by  the 

receive  filter,  is  added  according  to  SNR.  Upon  symbol- 
rate  sampling  and  demodulation,  148  samples  of  y{k)  are 
obtained.  Then,  both  the  non-blind  cross-correlation  (CC) 
and  the  blind  EVI  channel  estimation  schemes  are  applied 
to  y{k),  where  the  former  utilizes  the  training  midamble 
(see  eq.  (5))  while  the  latter  uses  142  samples^  of  y{k). 
Both  approaches  are  given  the  number  L  =  ^-h  1  of  channel 
coefficients  to  be  estimated,  where  L  is  calculated  from  the 
effective  length  of  the  sample  power  delay  spectrum  of  the 
GSUS  channel.  Note  that  the  actual  mean  effective  length 
g  -h  1  of  the  channel  i)  may  well  be  shorter  due  to 
time  selective  fading  (see  Fig.  4  att  1200r,  e.g.).  The 
resulting  channel  estimates  h{K.,  are  then  passed  to  the 
Viterbi  detector  to  obtain  the  estimated  bursts  . 

In  the  frame  of  Monte-Carlo  runs,  this  procedure  is 
executed  for  2000  bursts  Do,  •  •  •  ,Diggg  and  SNR  values 
in  the  range  from  0  to  16  dB.  Finally,  BER  is  calculated 
from  all  bursts  and  transmitted  at  a  given  SNR. 

^For  EVI,  the  training  sequence  was  replaced  with  26  additional  data  bits. 


For  each  of  the  seven  true  channel  slices  per 

burst  (/i  =  0,  •  ‘  ,  6),  the  Normalized  Mean  Square  Error 
(NMSE)  of  the  associated  estimate  h{K,  is  calculated"^. 
Averaging  over  all  values  of  ^  delivers  the  quality  measure 


NMSE(/i)  = 


2000^^ 


(7) 


at  seven  instants  26T^  per  burst.  Finally,  a  weighted 
average  over  all  values  of  /i  is  calculated  to  obtain  an  overall 
measure  (denoted  NMSE)  for  each  value  of  SNR. 

Figure  5:  For  three  sample  COST-207  GSUS  channels, 
this  figure  displays  the  performance  of  both  the  channel 
estimation  algorithms  and  the  Viterbi  detector  in  terms  of 
SNR.  The  upper  row  of  subplots  displays  the  NMSE  values 
on  a  logarithmic  scale,  while  the  bottom  row  shows  the  bit 
error  rates  (BER).  For  the  Figures  5a,  b,  and  c,  a  Bad 
Urban  (BU),  a  Typical  Urban  (TU),  and  a  Hilly  Terrain  (HT) 
sample  channel  is  used,  where  the  number  of  coefficients 
of  each  estimate  is  set  to  L  =  4,  3,  and  6,  respectively. 
The  solid  lines  refer  to  EVI,  while  the  dashed  lines  indicate 
the  performance  of  the  CC  approach.  All  subplots  contain 
two  lines  of  either  type,  where  the  bottom  one  is  obtained 

all  blind  system  identification  methods  can  not  estimate  one  complex 
factor,  each  estimate  was  multiplied  with  the  optimum  constant  {=>  min. 
Euclidean  distance  from  the  true  channel)  before  NMSE  was  calculated. 
To  ensure  fairness  of  the  comparison,  this  is  done  for  CC’s  estimates,  too. 
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by  suppressing  time- variance  of  the  channel  within  each 
burst  (although  this  is  in  accordance  with  the  assumption 
made  by  the  channel  estimation  and  Viterbi  algorithms,  it  is 
quite  unrealistic  and  meant  for  reference  only).  Note  that  all 
curves  suffer  from  the  linearity  assumption  for  the  OMSK 
modulator  as  well  as  from  the  order  misfit  between  L  and 
the  mean  effective  length  g  +  1  of  the  current  channel. 
From  the  bottom  CC  and  EVI  lines  in  the  upper  subplots, 
we  realize  that  for  EVI,  NMSE  is  1.3  to  three  times  as  high 
as  for  CC,  where  the  factor  1.3  holds  for  low  values  of  SNR, 
while  3  applies  above  14  dB.  From  the  upper  solid  and 
dashed  lines,  however,  we  realize  that  CC’s  estimates  are 
heavily  affected  by  time- variance  within  each  burst  period 
while  those  of  EVI  degrade  moderately  so  that  estimation 
quality  becomes  rather  similar.  This  is  due  to  the  fact 
that  non-blind  approaches  have  no  information  about  the 
channel  coefficients  outside  the  training  midamble,  while 
blind  approaches  estimate  some  sort  of  mean  channel  slice. 
Note  that  for  the  TU  channel,  EVI  even  outperforms  CC. 
With  the  bottom  subplots  of  Fig.  5,  dotted  lines  are  added 
for  comparison.  They  are  obtained  when  using  “ideal”  non¬ 
blind  channel  estimates  for  MLSE,  i.e.  f^,3) 

for  K  =  0,  ■  • ' ,  L  -  1.  For  the  channels  with  suppressed 
time-variance  within  each  burst  (bottom  three  lines),  we 
realize  that  at  BER  «  2%,  blind  EVI  channel  estimation 
requires  about  1  dB  (BU),  0.4  dB  (TU),  and  2.2  dB  (HT) 
more  in  SNR  than  the  non-blind  CC  scheme.  If  the 
quasi  time-invariance  assumption  over  each  burst  is  violated 
(upper  three  lines)  BER  levels  off  for  high  values  of  SNR. 
For  the  BU  channel,  again,  EVI  requires  about  1  dB  more 
in  SNR  than  CC.  While  this  loss  falls  to  zero  for  the  TU 
channel,  it  increases  to  about  2.5  dB  for  the  sample  HT 
channel.  Averaging  over  the  three  sample  channels  results 
in  a  mean  loss  of  about  1.2  dB. 


NMSE(fi)  for  the  BU  channel  at  10  dB 


Figure  6:  NMSE  in  terms  of  the  channel  slice  index  p  per  burst 
for  the  EVI  (solid),  CC  (dashed)  and  “ideal”  (dotted) 
estimates  of  the  BU  channel  at  SNR  =10  dB 

Figure  6:  For  the  Bad  Urban  (BU)  simulation  results 
at  10  dB,  this  figure  provides  a  detailed  view  of  channel 


estimation  quality  in  terms  of  the  bit  index  per  burst.  It 
shows  the  evolution  of  NMSE  as  a  function  of  the  channel 
slice  index  fi.  Obviously,  “ideal”  estimates  (dotted  line) 
based  on  the  training  midamble  have  a  vanishing  value  of 
NMSE(/i)  in  the  center  of  each  burst  only  {/j,  =  3),  while 
it  increases  as  the  burst  boundaries  are  approached.  At 
higher  values,  CC's  estimates  (dashed  line)  reveal  a  similar 
behavior.  Although  EVFs  estimation  quality  is  inferior  at 
the  center,  it  degrades  just  slightly  towards  the  beginning 
and  the  end  of  the  burst,  as  can  be  seen  from  the  solid  line. 


4.  Conclusions 

We  have  demonstrated  that  for  GSM  data  transmission 
over  mobile  comm,  channels,  the  HOS-based  blind  channel 
estimation  method  EVI  entails  an  SNR  loss  of  1.2  dB  only 
(averaged  over  three  COST-207  propagation  environments) 
compared  with  the  non-blind  cross-correlation  scheme.  As 
just  142  samples  of  the  demodulated  sequence  can  be  used 
for  channel  estimation,  we  consider  these  results  quite 
remarkable  for  an  algorithm  based  on  HOS. 
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Abstract 

This  submission  proposes  the  use  of  the  unnormalised 
bispectrum  as  a  signal  processing  tool  for  the  diagnosis 
of  inverter  fed  induction  machine  fault  conditions. 
Increasingly,  induction  machines  are  supplied  from  non- 
sinusoidal,  variable  speed  sources  which  increases  the 
complexity  and  magnitude  of  the  machine  cage  vibration. 
In  addition,  contamination  of  the  vibration  signal  from 
both  known  and  unknown  sources  makes  accurate  fault 
detection  more  difficult.  This  paper  addresses  both 
issues,  experimental  results  are  presented,  and  it  is  shown 
that  the  unnormalised  bispectrum  improves  on  the 
diagnostic  capability  of  more  conventional  second  order 
statistical  measures. 


1,  Introduction 

The  induction  machine  is  the  single  most  common 
electromechanical  energy  conversion  device  available.  It 
represents  the  backbone  of  the  electrical  engineering 
industry,  and  is  generally  a  reliable  and  robust  system 
component.  That  said,  the  induction  machine  does  fail, 
most  usually  as  the  result  of  aging  or  poor  construction. 
If  failure  is  of  a  catastrophic  nature,  danger  to  plant  and 
personnel  may  follow.  Without  a  comprehensive 
mechanism  for  induction  machine  Condition  Monitoring 
(CM),  incipient  machine  failure  often  goes  undetected, 
increasing  the  risk  of  catastrophic  breakdown.  If  a  CM 
regime  is  implemented,  the  impending  failure  can  be 
detected,  a  risk  assessment  made  and  remedial  action 
scheduled.  For  example,  an  induction  machine  may  be  a 
part  of  a  critical  process,  the  machine  might  develop  a 
minor  fault  subsequently  detected  by  an  accurate 
induction  machine  CM  system.  If  the  fault  is  only  likely 
to  deteriorate  gradually  and  not  result  in  catastrophic 
failure,  the  machine  can  continue  to  be  operated  and  the 
subsequent  deterioration  monitored.  Maintenance  can  be 
planned  for  the  next  available  shutdown,  preserving  the 
criticality  of  the  machine  without  compromising  safety. 

Many  methods  of  induction  machine  CM  exist,  but  in 
general  diagnosis  is  obtained  by  evaluation  of  a  second 
order  statistical  measure,  the  power  spectral  density  (psd) 


of  a  characteristic  machine  signal.  That  signal  can  be  any 
one  of  a  number  of  system  properties,  but  is  usually  the 
stator  electromagnetic  vibration,  motor  phase  current,  air 
gap  axial  flux  or  emitted  noise.  [1-4].  Each  of  these 
methods  has  certain  advantages  and  allows  induction 
machine  CM  to  a  certain  extent.  However,  no  panoptic 
means  of  fault  detection  has  been  found  as  yet.  It  is  the 
authors  opinion  that  this  lack  of  integrity  is  symptomatic 
of  the  processing  techniques  employed,  rather  than  the 
signals  being  monitored. 

In  recent  years  there  has  been  an  increase  in  research 
into  Higher  Order  Statistics  (HOS).  Initially  this  research 
concentrated  primarily  on  the  theoretical  aspects  of  HOS, 
with  little  recourse  to  practical  applications.  Recent  work 
[5-7]  has  suggested  that  third  order  measures  conveys 
more  pertinent  CM  information  than  the  power  spectrum 
alone,  consequently  more  attention  is  being  paid  to  the 
application  of  HOS  measures  to  real  life  issues.  A 
proportion  of  the  literature  has  been  devoted  to  the  use  of 
HOS  as  a  CM  tool,  i.e.  for  the  diagnosis  of  electrical 
machine  faults  [1],  gearbox  faults  [8]  and  structural  fault 
diagnosis  [9].  It  is  contended  that  the  application  of  HOS 
techniques  to  the  CM  of  real  electrical  systems  is  timely 
[10],  as  rotating  machine  faults  give  rise  to  a  non-linear 
mode  of  operation  and  HOS  techniques  allow  single 
sensor  diagnosis  of  system  non-linearities.  Further,  the 
application  of  HOS  to  machine  vibration  signals  results  in 
the  reduction  of  Gaussian  noise.  In  addition,  the  retention 
of  all  signal  phase  information  is  an  inherent  feature  of 
the  calculation  of  HOS. 

To  the  best  of  the  authors  knowledge,  none  of  the 
previous  work  has  quantitatively  compared  HOS  and  the 
psd  for  the  detection  of  electrical  machine  faults.  This 
paper  will  introduce  variable  frequency  supplies  and  will 
compare  the  abilities  of  the  power  spectrum  and  the 
unnormalised  bispectrum  to  diagnose  inverter  fed 
induction  machine  fault  conditions  using  contaminated 
vibration  signals. 

2.  Inverter  Drives 

A  relatively  recent  advance  in  the  utilization  of 
induction  machines  is  the  wide  use  of  variable  speed 
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drives,  for  example,  the  Pulse  Width  Modulated  (PWM) 
inverter.  These  drives  enjoy  huge  popularity  as  they 
allow  flexible,  variable  speed  operation  of  the  induction 
machine.  Figure  1  shows  a  simplistic  PWM  inverter 
drive. 


Three  phase 
supply 


Figure  1.  Simple  PWM  inverter  drive. 

The  PWM  converter  rectifies  the  three  phase  input 
voltage  and  controls  the  magnitude  of  the  voltage  in  the 
dc  link.  The  capacitor  is  primarily  for  smoothing  the  dc 
link  voltage,  and  the  PWM  inverter  is  used  to  control  the 
frequency  of  the  output  voltage  waveform.  Figure  2 
shows  a  typical  single  phase  output  voltage  waveform 
from  a  PWM  inverter  supply. 


/ 

''' 

\ 

- Output  voltage 

- Fundamental 

\ 

/ 

/ 

Time ' 

Figure  2.  Single  phase  PWM  output. 

Clearly,  the  output  voltage  is  far  from  sinusoidal  and 
contains  many  harmonic  components.  One  of  the 
consequences  of  the  use  of  PWM  inverters  is  a 
subsequent  increase  in  the  magnitude  and  complexity  of 
machine  vibration,  due  to  the  high  harmonic  content  of 
the  motor  stator  voltage  [11].  This  has  implications  for 
the  mechanical  reliability  of  inverter  fed  induction 
machines  and  the  levels  of  acoustic  noise  emitted.  A 
further  consideration  when  using  PWM  inverters  is  the 
derating  of  the  machine  because  of  additional  heating 
effects  caused  by  the  non  sinusoidal  supply  harmonics. 
Any  diagnostic  system  based  on  the  measurement  of 
machine  vibration  must  be  able  to  interpret  this  complex 
machine  signal  and  produce  definitive  and  reliable  fault 
diagnosis.  For  this  to  happen,  important  or  relevant 
features  identifying  each  machine  condition  must  be 
monitored,  extracted  and  diagnosis  reached  if  these 
features  reach  prescribed  levels. 


3.  Contamination  of  the  Vibration  Signal 

In  any  industrial  environment  the  ambient  or 
background  vibration  at  any  location  varies  temporally  as 
machines  operate  and  loading  criteria  fluctuate.  The 
ambient  vibration  also  varies  spatially  as  the  material 
properties  of  foundations  and  machinery  mountings 
change.  This  impacts  directly  on  any  vibration  based  CM 
system  which  is  in  use.  Clearly,  any  change  in 
background  vibration  manifests  itself  as  a  change  in  the 
vibration  of  the  machine  under  observation,  resulting  in  a 
variation  of  any  signal  characterising  machine 
performance  [12].  Further  contamination  of  the  machine 
vibration  signal  may  originate  from  interference  present 
in  the  electrical  supply,  or  from  other,  unknown  sources. 

It  is  clear  that  any  contamination  of  the  induction 
machine  vibration  signal  will  result  in  degradation  of  the 
information  contained  therein,  thus  making  interpretation 
of  the  signal  more  difficult.  It  is  helpful  therefore,  to 
have  a  CM  system  as  impervious  to  contaminants  as 
possible,  such  that  important  features  contained  in  the 
machine  vibration  are  retained.  It  has  been  shown  [13] 
that  the  evaluation  of  a  signals  HOS  results  in  the 
elimination  of  additive  Gaussian  noise.  However,  it  has 
also  been  shown  [14]  that  for  data  records  of  fixed  length, 
the  noise  level  is  attenuated  and  not  removed  completely. 
Assuming  we  can  define  these  vibration  signal 
contaminants  to  be  random,  and  to  have  a  Gaussian 
distribution  and  zero  mean  [9],  it  is  proposed  that  an 
induction  machine  CM  system  based  on  HOS,  and  in 
particular  the  unnormalised  bispectrum,  will  result  in  the 
reduction  of  noise  in  the  vibration  signal  and  allow  the 
retention  of  important  fault  characteristic  features. 

4.  Definitions 

The  decomposition  of  a  signal  into  its  spectral 
components  provides  a  great  deal  of  information.  The 
Fourier  Transform  (FT)  is  the  most  common  way  of 
computing  the  spectral  components  of  a  signal.  Consider 
a  discrete-time  series  x(n),  n-0,  ±1,  ±2  ...etc,  its  FT, 
X(k),  is  described  by 

oo 

X{k)=  Y.  x{n)  cxp{- jlK  kn)  (1) 

The  most  frequently  used  tool  for  signal  processing 
over  the  past  thirty  years  has  been  the  evaluation  of 
signals’  power  spectral  density  (psd).  The  psd  is  a 
measure  of  the  spread  of  the  signal  power  over  a  range  of 
frequencies,  considering  each  frequency  to  be 
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independent  of  all  others.  The  psd  of  a  signal  can  be 
given  in  terms  of  its  FT,  by 

Where  E[  ]  represents  the  expectation  operator  and 
)  is  the  complex  conjugate  of  X(  j.  As  an  extension  to 
the  estimation  of  these  second  order  measures,  a  product 
of  Fourier  Transforms  can  be  formed  at  different 
frequencies  to  produce  the  third  order  measure  called  the 
bispectrum.  The  bispectrum  considers  the  degree  of 
coupling  between  three  frequencies,  k,  I  and  /:+/.  The 
bispectrum  of  x(n)  is  described  by 

=  E[X(k)X(l)X^(k-^l)]  (3) 

Clearly,  we  can  see  that  the  bispectrum  is  a  function  of 
two  frequency  indices  and  as  such  is  more 
computationally  intensive  to  estimate  than  the  psd.  In  the 
calculation  of  the  bispectrum  only  bifrequencies  in  the 
Inner  Triangle  (IT)  of  the  principle  domain  will  be 
considered,  in  order  to  minimise  the  possibility  of 
sampling  rate  discrepancies  degrading  the  bispectral 
estimation  [15],  ensuring  best  reliability. 

5.  Laboratory  Work 

A  3kW,  6-pole,  squirrel  cage  induction  machine  was 
used  to  drive  an  axial  flow  screw  air  compressor  via  a 
speed  up  helical  gearbox,  the  induction  machine  being 
fed  from  a  P.W.M.  inverter  as  shown  in  Figure  3. 


Figure  3.  Laboratory  apparatus. 

The  induction  machine  vibration  was  measured  by  a 
piezoelectric  accelerometer  mounted  radially  on  the 
casing  of  the  machine.  The  ability  to  replicate  common 
induction  machine  faults  was  a  central  feature  of  the 
electromechanical  system.  In  addition  to  the  no  fault 
condition,  four  single  and  four  combined  fault  conditions 
were  seeded  on  the  induction  machine.  The  faults 
implemented  were  selected  to  reflect  common  faults 
found  in  induction  machines  in  normal  service.  After  the 
implementation  of  each  fault,  a  fifteen  minute  warm  up 
period  was  allowed  to  ensure  that  the  induction  machine 


had  reached  steady  state  operation.  The  induction 
machine  conditions  were  as  detailed  in  Table  1 . 


Table  1.  Conditions  implemented  on  test  machine. 


Condition 

Description 

1 

No  fault 

2 

One  broken  rotor  bar 

3 

Two  broken  rotor  bars 

4 

Stator  short  circuit 

5 

Unbalanced  supply 

6 

Two  broken  rotor  bars,  stator  short 
circuit 

7 

Two  broken  rotor  bars,  unbalanced 

supply  _ 

8 

Stator  short  circuit,  unbalanced  supply 

9 

Two  broken  rotor  bars,  stator  short 
circuit,  unbalanced  supply 

For  each  of  the  nine  machine  states,  60  motor 
vibration  signatures  of  20480  points  were  conditioned, 
sampled  at  lOOkHz  and  acquired. 

6.  Data  Processing 

Data  processing  was  achieved  in  two  stages;  induction 
machine  fault  templates  were  constructed,  then  the 
diagnostic  capabilities  of  the  psd  and  the  bispectrum  were 
evaluated. 

6.1  Fault  template  construction 

The  induction  machine  vibration  signatures  were  split 
into  two  groups,  half  being  designated  ‘template’  data 
and  half  ‘testing’  data.  The  power  spectrum  and 
bispectrum  of  the  270  ‘template’  vibration  signatures  was 
computed  using  a  1024  point  fft,  Hanning  window  and 
70%  overlap.  The  frequency  and  bifrequency 
components  of  the  ‘template’  power  spectra  and  bispectra 
were  sorted  into  a  descending  order  of  importance, 
according  to  the  product  of  the  component  mean  and 
standard  deviation,  evaluated  over  all  machine 
conditions. 

The  range  of  each  frequency  and  bifrequency 
component  of  the  psd  and  bispectrum  was  stored  for  each 
machine  condition,  giving  a  characteristic  template  for 
the  psd  and  bispectrum  for  all  machine  conditions. 

6.2  Diagnostic  evaluation 

The  power  spectrum  and  bispectrum  of  the  ‘testing’ 
data  with  0,  10,  20  and  30%  of  normally  distributed. 
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white,  additive  Gaussian  noise  was  computed,  the  noise 
being  added  to  simulate  contamination  of  the  vibration 
signal.  The  frequency  and  bifrequency  components  of  the 
Testing’  data  were  sorted  as  previously.  These  spectra 
were  then  compared  with  all  machine  condition 
Templates’.  The  machine  condition  was  only  diagnosed 
if  all  spectral  components  considered  fell  within  the  range 
of  components  of  a  characteristic  template.  The  condition 
thus  determined  was  compared  with  the  machine  state  in 
order  to  evaluate  the  accuracy  of  diagnosis.  The  whole 
process  was  repeated  with  10,  20  and  30%  contamination 
of  the  Template’  vibration  traces. 

7.  Results 

Figure  4  shows  the  variation  of  diagnostic  capability  of 
the  psd  for  a  range  of  important  frequency  components, 
and  with  varying  degrees  of  vibration  contamination.  In 
this  instance  the  characteristic  template  was  constructed 
with  0%  vibration  contamination. 


Frequency  components  considered. 


Figure.  4.  Psd  diagnosis,  0%  Template*  contamination. 

It  can  be  seen  that  maximum  diagnosis  was  achieved 
when  the  contamination  of  the  Testing’  data  was  0%,  the 
same  as  when  the  Template’  was  created.  Further,  the 
diagnosis  worsened  in  proportion  to  the  added 
contamination.  In  all  cases  the  diagnosis  worsened  as  the 
number  of  frequency  components  considered  increased 
after  maximum  diagnosis.  Figure  5  shows  the  diagnostic 
variation  of  the  psd  with  20%  vibration  contamination. 


Frequency  comptmcnix  considered. 


Figure.  5.  Psd  diagnosis,  20%  Template’ 
contamination. 


Figure  5  shows  that  maximum  diagnosis  was  attained 
when  the  contamination  of  the  Testing’  data  was  20%. 
Other  values  of  vibration  contamination  degraded 
performance.  Similar  results  were  obtained  for  10  and 
30%  contamination  of  vibration  in  the  Template’  data. 
Figure  6  shows  the  diagnostic  variance  of  the  bispectrum 
for  0%  contamination  of  the  vibration  when  creating 
templates. 


Bifrequency  cumpunenis  considered. 


Figure.  6.  Bispectrum  diagnosis,  0%  Template’ 
contamination. 

Figure  6  shows  how  the  maximum  diagnosis  of  the 
bispectrum  is  effectively  invariant  of  the  vibration 
contamination  in  the  Testing’  data.  Also,  the  diagnosis  is 
invariant  of  the  number  of  important  bifrequencies 
considered.  Figure  7  details  bispectrum  variance  for  20% 
vibration  contamination  in  the  template  data. 


Bifrequency  cumpf’ncnts  considered. 


Figure  7.  Bispectrum  diagnosis,  20%  Template’ 
contamination. 

Again  it  can  be  seen  that  the  maximum  diagnosis  is 
invariant  of  the  contamination  of  the  vibration  testing 
data  and  the  number  of  bifrequency  components 
considered.  These  results  were  consistent  for  10  and  30% 
vibration  contamination  in  the  template  data,  also. 

In  summary,  it  is  clear  that  the  diagnostic  success  of 
the  psd  is  dependent  on  the  degree  of  vibration 
contamination  contained  in  both  the  Template’  and 
Testing’  data.  Acceptable  diagnosis  is  only  achieved 
when  the  vibration  contamination  contained  in  the 
Template’  data  matches  that  of  the  Testing’  data. 
Similarly,  degradation  of  diagnosis  is  proportional  to  the 


difference  between  the  vibration  contamination  contained 
within  the  ‘template’  and  ‘testing’  data.  Also,  it  has  been 
shown  that  the  psd  diagnosis  is  dependent  on  the  number 
of  frequency  components  of  the  power  spectrum  which 
are  considered  to  form  the  diagnosis,  the  added  noise 
making  accurate  diagnosis  impossible  when  the  less 
important  components  were  considered.  Clearly,  this 
makes  the  psd  a  somewhat  limited  diagnostic  device,  with 
accurate  diagnosis  dependent  on  the  level  of  vibration 
corruption  and,  on  analysing  the  optimal  number  of 
frequency  components. 

In  contrast,  it  can  be  seen  that  the  bispectrum  provides 
100%  accurate  diagnosis  irrespective  of  the  vibration 
contamination  difference  between  ‘template’  and  ‘testing’ 
data.  In  addition,  this  diagnosis  is  maintained 
independent  of  the  number  of  bifrequency  components 
considered  after  maximum  diagnosis.  Clearly,  this  use  of 
HOS  reduces  the  effect  of  the  added  noise,  even  when 
considering  less  important  components  of  the  signal. 

8.  Conclusions 

It  is  clear  from  the  results  presented  that  the 
bispectrum  offers  a  valid  means  of  CM  inverter-fed 
induction  machines.  Accurate  machine  condition 
diagnosis  was  achieved  repeatably  over  a  number  of 
different  machine  faults.  It  is  also  evident  that  the 
bispectrum  provides  enhanced  fault  recognition  when 
compared  with  the  more  conventional  power  spectrum 
based  technique.  In  addition,  the  diagnostic  capability  of 
the  bispectrum  based  regime  is  insensitive  to  vibration 
contamination  and  retains  conclusive  diagnosis  regardless 
of  the  number  of  bispectral  components  considered. 

The  authors  feel  this  represents  an  advance  in 
diagnostic  technology,  with  the  bispectrum  clearly 
identifying  spectral  components  which  uniquely  identify 
various  fault  conditions.  New  work  is  to  be  concentrated 
on  identifying  important  components  explicitly  in  an 
attempt  to  make  the  ‘fingerprint’  approach  redundant. 
This  will  allow  predictive  analysis  to  be  made  and 
remove  the  need  for  a  priori  data. 
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ABSTRACT 

We  model  tissue  as  a  collection  of  point  scaiterers 
embedded  in  a  uniform  media,  and  show  that  the  higher- 
order  statistics  (HOS)  of  the  scatterer  spacing  distribu¬ 
tion  can  be  estimated  from  digitized  RF  scan  line  seg¬ 
ments  and  be  used  in  obtaining  tissue  signatures.  Based 
on  our  model  for  tissue  micro  structure,  we  estimate  re¬ 
solvable  periodicity  and  correlations  among  non-periodic 
scatterers.  Using  higher-order  statistics  of  the  scattered 
signal,  we  define  as  tissue  “coloF^  a  quantity  that  de¬ 
scribes  the  scatterer  spatial  correlations,  show  how  to 
estimate  it  from  the  higher-order  correlations  of  the  dig¬ 
itized  RF  scan  line  segments,  and  investigate  its  poten¬ 
tial  as  a  tissue  signature. 


1  Introduction 

Ultrasound  imaging  is  a  widely  used  technique  in  the 
diagnosis  of  tumors  of  soft  tissues.  Currently,  the  clini¬ 
cal  diagnosis  is  based  on  visual  interpretation  of  images 
by  radiologists.  There  have  been  many  attempts  at  de¬ 
veloping  objective  tissue  characterization  criteria  on  the 
premise  that  there  is  much  more  observer-independent 
information  available  from  ultrasound  than  what  is  cur¬ 
rently  being  used.  These  are  rooted  on  the  fundamental 
notion  that  the  biological  tissues  are  composed  of  char¬ 
acteristic  structures  whose  ultrasonic  properties  often 
change  due  to  diseases.  The  goal  of  tissue  characteriza¬ 
tion  is  the  extraction  of  signatures  that  assume  distinct 
values  in  the  presense  of  normal  and  diseased  states  of 
tissues,  such  that  it  is  possible  to  differentiate  between 
them. 

Although  the  exact  identities  of  the  physical  struc¬ 
tures  responsible  for  ultrasound  backscattering  are  gen- 

*This  work  was  supported  by  NSF  under  grant  MIP-95  53227 
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erally  unknown  [17],  [3],  they  can  usually  be  char¬ 
acterized  either  as  macroscopic  or  microscopic,  com- 
pared  with  the  wavelength  of  the  interrogating  ultra¬ 
sonic  beam.  The  tissue  is  often  modeled  as  a  collec¬ 
tion  of  point  scatterers,  embedded  in  a  uniform  non¬ 
scattering  medium.  Considering  the  biological  variabili¬ 
ties  associated  with  tissues,  the  spatial  distribution  and 
the  scattering  strengths  (scattering  cross-sections)  as¬ 
sociated  with  these  scatterers  are  usually  described  in 
statistical  terms.  The  statistics  of  the  inter-scatterer 
spacing  distribution  are  commonly  used  as  tissue  signa- 
tures. 

In  organs  such  as  the  liver  the  overall  arrangement  of 
the  tissues  consists  of  an  organized  repetition  of  a  basic 
structural  unit.  This  repeated  structure  is  regarded  to 
be  providing  resolvable,  repeated  scattering  centers  for 
the  propagating  ultrasonic  pulse  [8],  [10],  [11],  [5].  It  has 
been  shown  that  a  periodicity  can  be  observed  from  the 
liver  pulse-echo  data,  and  the  estimated  periodicity  can 
be  linked  to  the  mean-scatterer-spacing  of  the  underly¬ 
ing  tissue  [8].  It  was  also  shown  that  the  mean  scatterer 
spacing  may  be  used  as  a  feature  for  tissue  character¬ 
ization  of  diffuse  liver  diseases,  such  as  cirrhosis  and 
chronic  active  hepatitis  [8], [11]. 

The  scatterer  spatial  correlation  has  been  used 
in  the  past  to  model  ultrasonic  properties  of  tissue 
[9],[13],[6],[?]  and  was  shown  to  lead  to  promising  tissue 
signatures.  In  [13],  it  was  shown  that  using  power  spec¬ 
tra,  deterministic,  membrane-like  structures  as  well  as 
structures  consisting  of  random,  diffuse  structures  can 
be  characterized.  In  [14],  effective  scatterer  sizes,  con¬ 
centrations  and  acoustic  impedances  were  investigated 
using  power  spectra,  as  potential  tissue  signatures.  The 
effective  scatterer  size  was  reported  to  be  the  most  im¬ 
portant  tissue  feature  sensed  with  the  method  of  [14]. 

In  this  paper,  we  use  a  point  scatterer  model  sim¬ 
ilar  to  the  ones  in  [11]  and  [7]  to  describe  tissue  mi¬ 
crostructure,  with  the  exception  that  our  model  takes 
the  correlations  existing  among  both  resolvable  and  non- 
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resolvable  scatterers  into  account.  This  enables  us  to 
also  consider  cases  of  coherent  long-range  scatterer  dis¬ 
tributions  which  have  high  variances  associated  with 
the  inter-scatterer  spacing,  i.e.  almost  no  periodic¬ 
ity,  and,  non-periodic  echos  resulting  from  correlated, 
non-periodic  components  of  both  diffused  and  struc¬ 
tural  scatterers.  We  assume  that  RF  echoes  are  non- 
Gaussian,  on  the  grounds  of  empirical/theoretical  justi¬ 
fications  presented  in  works  such  as  [19],  [4],  [2],  [18]  and 
[20].  Based  on  our  model  for  tissue  microstructure,  we 
develop  schemes  for  the  estimation  of  resolvable  period¬ 
icity  as  well  as  correlations  among  non-periodic  scatter¬ 
ers.  Using  higher-order  statistics  of  the  scattered  signal, 
we  define  as  tissue  “color”  a  quantity  that  describes  the 
scatterer  spatial  correlations,  show  how  to  evaluate  it 
from  the  higher-order  correlations  of  the  digitized  RF 
scan  line  segments,  and  investigate  its  potential  as  a  tis¬ 
sue  signature.  We  also  preliminary  evidence  that  even 
when  there  is  no  significant  periodicity  in  data,  we  are 
still  able  to  characterize  tissues  using  signatures  based 
on  the  higher-order  cumulant  structure  of  the  scatterer 
spacing  distribution. 

2  Theoretical  developments 

Assuming  a  narrow  ultrasound  beam,  linear  propa¬ 
gation  and  weak  scattering  in  the  tissue  medium,  we 
model  the  ultrasonic  RF  echo,  y{i),  by, 

y(t)  =  h{t)*f{t)  +  w{t),  (1) 

where  h{t)  is  the  pulse-echo  wavelet,  w{t)  is  the  observa¬ 
tion  noise,  and  denotes  convolution.  The  quantity 
f{t)j  which  we  will  call  the  tissue  response^  represents 
the  underlying  tissue  microstructure. 

Following  [3], [11]  and  [7]  we  model  the  structures 
within  tissue  that  are  responsible  for  backscattered  ul¬ 
trasonic  field  by  point-scatterers  organized  at  different 
levels.  Existing  work  such  as  [11]  and  [7]  attempt  to 
characterize  liver  tissue  based  on  a  two  component  point 
scatter  model:  (a)  the  diffused  component,  which  repre¬ 
sents  [11]  randomly  positioned  scatterers  of  sufficient 
concentrations  to  produce  echo  signals  with  circular 
Gaussian  statistics.  The  positions  of  individual  scat¬ 
terers  are  assumed  un correlated  [11];  (b)  the  coherent 
component,  which  represents  non-randomly  distributed 
scatterers  with  long-range  order  [11],  [10]. 

In  this  paper  (see  also  [1]),  we  describe  liver  tissue  by 
a  three-component  model,  which  in  addition  to  (a)  and 
(b)  above,  also  takes  into  account  the  correlations  exist¬ 
ing  among  (i)  diffused  scatterer  locations,  (ii)  resolvable 
and  unresolvable  pseudo-periodic  scatterers  leading  to 
long  range  order,  and  (iii)  collection  of  scatterers  leading 


to  short  range  order,  but  not  strong  enough  to  violate 
the  conditions  of  weak-scattering  or  the  stationarity  of 
the  RF  echo  in  a  significant  manner. 

We  model  the  tissue  response  f{t)  as: 

fit)  =  rdit)  +  Tmit)  +  rrit),  (2) 

where  rd{t)  represents  unresolvable  diffuse  scatterers 
leading  to  fully-developed  speckle  with  Gaussian  statis¬ 
tics  [11];  rr{t)  represents  resolvable,  coherent  scatter¬ 
ers  with  long-range  order  (pseudo-periodicity)  [11].  The 
quantity  rm{t)  represents  the  combined  effects  of  unre¬ 
solvable  periodicity  from  structural  scatterers,  and,  cor¬ 
related  non-periodic  components  of  diffused  and  struc¬ 
tural  scatterers  (short-range  order).  The  terms  rm{t), 
rr{t)  (and  their  contributions  to  the  RF-echo  y{t))  are 
non-Gaussian,  due  to  contributions  from  periodic  struc¬ 
tures  and/or  scatterers  of  short  range  order.  The  use  of 
higher-order  statistics  (HOS)  as  the  tool  allows  us  to 
properly  treat  the  non-Gaussian  components,  to  which 
deterministic  structures  such  as  large  cell  assemblies  and 
necrotic  areas  inside  tumors  contribute. 

The  ultrasonic  rf  echo,  y{t),  can  be  expressed  as: 

y{i)  =  yd{t)  +  ym{t)  +  yr{t)  +  w{t),  (3) 

where  yd{t)  =  Vdit)  *  h{t),  ymii)  =  rm{t)  *  h{t)  and 
yr{t)  =  rr{t)  *  h{t)  are  respectively  called  [1]  the  dif¬ 
fused  {DC),  mixed  (MC)  and  resolvable-periodic  (RPC) 
components  of  y{t).  The  observation  noise  w{t)  is 
assumed  zero-mean,  Gaussian  and  uncorrelated  with 
yi{t),  i  =  d,m,r. 

Under  the  assumption  that  yi{t),  i  —  d,m,r  are  zero- 
mean,  mutually  uncorrelated,  the  third-order  cumulants 
c\{ti,T2)  [15]  of  the  process  y{t)  can  be  written  as 

C3(ri,r2)  =  c|^(ri,r2)+4"^(ri,r2)+c|"(ri,r2)+C3  (ri,r2) 

(4) 

where  cI^{ti,T2),  4”‘(ri,r2),  c|"(ri,r2),  c^(ri,r2)  re¬ 
spectively  denote  the  third-order  cumulants  of  yd{t), 
ym{l)^  2/r(0  observation  noise  w{t). 

Since  noise  and  diffused  compoment  are  Gaussian 
processes, 

03(7-1, r2)  =  c|^(ri,r2)  =  0,  V(ri,r2).  (5) 

The  suppression  of  the  Gaussian  component  of  the  tissue 
response  is  an  important  advantage  because  it  does  not 
carry  any  information  about  tissue. 

The  mixed  component  ym(0  2/(0 

is  modeled  as: 

=  (6) 
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Let  us  assume  that:  (Al)  Aj  is  uncorrelated  to  bj, 
i,  j  =  (A2)  \i,  i  =  1, 2,  •  •  •,  form  a  sta¬ 

tionary  process;  (A3)  ym  (t)  is  a  zero-mean  process. 
Under  (A3)  the  third  order  cumulants  of  ym(t)  equal, 

4"‘(ri,  r2)  =  E{ym{t)ymit  +  Ti)ym{t  +  r2)}, 

=  M3"'{ti,T2)*4{Ti,T2),  (7) 

with, 

^  m  ^  m  ^  7n 

^  {^p^g^s} 

pnl  q  =  l  5  =  1 

xE{S{t  -  Xp)S{t  +  n  -  Xg)S{t  +  r2  -  Xs)}  ,  (8) 

where  (Al)  has  been  used  in  obtaining  (8). 

Let  the  joint  probability  density  function  (arbitrary) 
for  the  scatterer  location  triplets  Xp^Xg^Xg  be  given  by 
fm{Xp,Xg,Xs),  Then  from  (8)  we  obtain: 


Nm  N  m  Nm 


M3"'{n,T2)  = 


p=l  s  =  l 


oo  oo  oo 


^///  —  Xp^6(t  +  7*1  —  Aqf)6(t  T2 


-oo  — oo  — oo 

X  fm{^Xpj  Xq  j  As )c?ApdAq dA^ 

Nm  Nm  Nm 

p=l  g=l  S  =  1 

T2).  (9) 


Under  the  assumption  (A2),  the  probability  density 
function  /m(^}^  +  Ti,t  +  r2)  does  not  depend  on  the 
particular  t  chosen.  Thus,  /m(^)^  +  d"  '^2)  can  be 
written  as  fmi'^ij'^2)-  Therefore,  from  (7)  and  (9)  we 
obtain: 

C3”*  ('^1  > ’■2)  =  7i'm/m(n>7"2)  *  c’1{ti,T2),  (10) 

where  Km  is  given  by, 

Nm  Nm  Nm 

^"«  =  EEE^{WJ-  (11) 

p=l  g=l  r=5 

The  resolvable 

periodic  component  has  been  shown  to  be  of  great 
significance  in  tissue  characterization [8],  [7].  To  be  clas¬ 
sified  as  the  resolvable  periodic  component,  the  scatterer 
separation  should  be  regular  enough,  and  the  repeat  dis¬ 
tances  should  be  large  enough  ( compared  to  the  length  of 
the  pulse-echo  impulse  response)  to  be  resolvable.  The 
Gamma  distribution  ha^  proven  to  be  a  particularly  use¬ 
ful  tool  in  describing  inter-scatterer  space  distribution 


[12],  [7],  because  of  its  versatility  in  producing  scatterer 
locations  ranging  from  almost  periodic  to  clearly  non¬ 
periodic. 

The  general  mathematical  description  of  the  RPC 
follows  directly  from  the  expressions  derived  for  the  case 
of  MG'  component.  The  resolvable  periodic  component, 
yr{t)^  can  be  written  as: 


J/r(0  = 


Nr 


y~^  iip6(f  —  ^p)  r  *  hit). 


Lp=i 


(12) 


Again,  we  assume:  (Bl)  is  un correlated  to  vj ,  i,  i  = 
l,2,--*,A^r;  (B2)  Oi,  i  =  1,2,  •  ••,  froms  a  stationary 
process;  (B3)  yr{t)  is  a  zero-mean  process. 

From  (12)  we  can  obtain  as  in  the  case  of  MC  com¬ 
ponent, 

C3"(n,'r2)  =  AV/r(ri,r2)  *  C3(ti,  r2),  (13) 

Nr  Nr  Nr 

Kr  =  EEE^^P^*^*}’  (14) 

p=l q=l  5=1 

where  the  joint  probability  density  function  fr{ri,T2) 
now  describes  a  (pseudo)  periodic  phenomenon,  i.e. 
/r(n,  ^2)  is  a  two  dimensional  bed  of  spikes,  whose  sep¬ 
aration  is  equal  to  the  mean-scat terer-spacing. 

In  the  following  we  derive  an  expression  for  fr{Ti,T2), 
under  the  constraint  that  the  resolvable  periodic  scat- 
terers  are  separated  by  a  constant  time  interval,  T,  i.e. 
the  process  is  strictly  periodic  and  the  resolvable  peri¬ 
odicity  (unknown)  is  T.  Then  the  periodic  component 
rr{t)  can  be  written  as: 

Nr 

rrit)  =  '£vp{t)8it-pT),  (15) 

P  =  1 

where  Vp(t)  is  the  scatterer  strength  function^  and  is  a 
bounded  function  defined  over  the  set  of  real  numbers, 
7^(_oo,oo)-  The  Fourier  transform  Rrioj)  of  rr{t)  can  be 
obtained  as, 

=  Y  E  (16) 

A:=  — 00 


where  V^(w)  is  the  Fourier  transform  of  Vp{t).  Let  the 
region  of  support  of  the  function  Vp{t)  be  {t  :  |t|  <  B}. 
As  B  —>■  00,  V{u)  -*  6{w).  As  B  00  the  bispectrum 
[15]  C'3’'(wi,a;2)  of  rr(t)  is  given  by: 


C'3’'(wi,W2)  =  72r(wi)i?r(w2)7?r(-Wl  -  W2), 

=  (y)’EEE%>-f  *■)«(“’ -Y*’) 

ki  k2  kz 

xS{-U2  -  Wl  -  ^fes),  (17) 
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=  (18) 
ki  k2 

For  the  case  ^(a;)  ^  from  (12)  and  (18)  we  can 
obtain  the  bispectrum  of  yr{t)  as: 

ki  k2 

O  Tf 

X%2  -  ^^2)}.  (19) 

The  third-order  cumulant  sequence  c|^(ri,r2)  of  the 
RPC  can  be  obtained  from  (19)  through  an  inverse 
Fourier  transform,  i.e., 

C3''('^l>  •^2)  =  CaC'Tl .  "^2)  *  E  E  ^("^1  “  kiT)6{T2  -  k2T). 

kl  k2 

(20) 

Comparing  (14)  and  (20)  we  get: 

/r(ri,r2)  =  EE^('^i  -kiT)6{T2  -k2T),  (21) 

kl  k2 

Finally,  the  third-order  cumulant  of  the  RF-echo 
equals: 

^3  (^15^2)  =  {-^m/m(n  )  ^■2)  -f“  Ar-/r’(ri ,  r2)}  +  C3  (ri ,  r2), 

=  {Kmfm{Tl,T2)  ^^6(ri  -fciT)«(r2-ifc2T)} 
k2 

*Cs{TuT2)y  (22) 

when  the  inter-scatterer  spacing  of  the  RPC  component 
is  a  constant  T. 

The  periodicity  can  be  estimated  from  (22).  The 
term  Kmfm{Ti,T2)*c^{TiyT2)  in  Eq.  (10)  can  be  consid¬ 
ered  as  the  third-order  cumulant  sequence  of  the  signal 

z{i)  =  e(:^)  *  u{t),  (23) 

where  e(^)  is  a  non-Gaussian  noise  process  whose  third 
order  cumulants  equal  Kmj  and  t/(t)  =  g{t)  *  h{t)  is  a 
linear  time  invariant  system.  We  call  g{i)  the  color  of 
the  tissue  response.  Based  on  (10)  the  third-order  cu¬ 
mulant  sequence  T2)  of  z(t)  can  be  retrieved  from 
^1(^1  >  T2).  Using  methods  of  system  reconstruction  from 
higher-order  spectra,  we  can  recover  u{t)  from  c^ri,  r2). 

The  quantities  Ui(^)  and  Uj{t)  obtained  from  two  dif¬ 
ferent  RF  line  segments  “P  and  “j”  can  be  written  as 
Uk{t)  =  h{t)  *  gk^t),  k  =  where  h{t),  are  invariant 
between  “i”  and  “j”;  gi{t)  and  gj{t)  respectively  rep¬ 
resent  the  color  of  the  tissue  responses  at  “i”  and  “j” . 
The  quantities  gkif)^  k  =  i^j  carry  information  about 
the  scatterer  spacings  corresponding  to  the  MC  com¬ 
ponent,  and  is  also  free  from  the  effects  of  h{i)^  mak¬ 
ing  it  a  good  candidate  for  a  tissue  signature.  Using 


ui{i)  and  U2{t)  we  form  a  tissue  signature  Sij{t)  as: 

Sij{t)  =  =  h{t)*gi{t)^{h{t)*gj{t)}^^  = 

9ii^)  *  where  denotes  the  convolutional  in¬ 

verse. 

If  the  estimations  are  from  two  regions  with  similar 
ultrasonic  properties,  Sij{t)  will  approach  a  Dirac  delta- 
function.  The  closeness  of  Sij  to  a  Dirac  delta  performs 
as  a  measure  of  the  similarity  between  two  different  re¬ 
gions. 

3  Results 

We  obtained  RF  ultrasound  data  corresponding  to 
clinical  images  of  human  livers  containing  tumors.  An 
example  of  such  an  image  is  shown  in  Fig.  1(a)  where 
the  tumor  is  visible  in  the  middle  part  of  the  image. 
Using  RF-echoes  inside  and  outside  the  known  tumor 
location,  we  estimated  the  periodicity  of  the  scatterer 
spacing  (m).  Histogram  for  m  is  shown  in  Fig.l  (b). 
From  Fig.  1  we  conclude  that  m  serves  as  a  feature  for 
characterizing  the  tumor  in  shown  in  Fig.  1(a).  Figure 
2  shows  the  color  of  the  tissue  responses  estimated  from 
three  different  liver  images  where  the  periodicity  failed 
as  feature.  Figs.  2(a),  (b)  and  (c)  show  the  color  of 
the  tissue  responses  estimated  inside  tumors  and  Figs. 
2(d),  (e)  and  (f)  show  the  corresponding  ones  estimated 
outside  the  tumors.  The  solid  lines  indicate  the  aver¬ 
age  estimate  and  the  dotted  lines  indicate  the  average 
zb  standard  deviation.  The  color  of  the  tissue  response 
compactly  summarizes  the  correlation  structure  existing 
among  the  scatterers  of  both  short-range  and  long-range 
order.  It  performs  as  a  tissue  signature  which  supple¬ 
ments  the  periodicity,  and  can  be  measured  even  when 
the  periodicity  is  not  defined. 
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Abstract 

In  this  paper f  we  propose  a  new  noise  filtering 
scheme  that  is  based  on  higher- order  statistics  (H.O»S.) 
for  photographic  images  corrupted  by  signal-dependent 
film  grain  noise.  In  addition,  reliable  estimation  of 
the  noise  parameter  using  H,0,S,  is  proposed.  This 
parameter  estimation  technique  can  be  used  to  gener¬ 
ate  film  grain  noise  which  has  applications  in  motion 
picture  and  television  productions.  Simulation  results 
show  that  the  proposed  filter  perform  better  than  exist¬ 
ing  methods  which  are  based  on  second-order  statistics. 


1.  Introduction 

Noise  suppression  is  a  common  application  in  image 
processing.  Traditionally  the  noise  process  is  taken  to 
be  additive  white  Gaussian  and  statistically  indepen¬ 
dent  of  the  signal.  However,  in  the  case  of  photographic 
images,  it  is  well  known  that  they  contain  film  grain 
noise  that  is  signal-dependent,  that  is,  the  noise  statis¬ 
tics  depend  on  the  signal.  The  noise  model  describing 
film  grain  noise  and  zero  mean  Gaussian  measurement 
noise  w  has  the  form  [1]: 

r{x,  y)  =  s{x,  y)  +  ks^  {x,y)n{x,y) w{x,y)  (1) 

where  s  is  the  noiseless  image  measured  in  density,  k  is 
the  scanning  constant,  p  is  an  exponent  that  depends 
on  film,  n  is  a  Gaussian  noise  with  zero  mean  and  unit 
variance.  Because  conventional  filtering  techniques  as¬ 
sume  a  different  noise  model,  they  do  not  perform  well 
in  this  case. 

Existing  methods  for  suppressing  signal-dependent 
film  grain  noise  include  modifications  of  standard  tech¬ 
niques  designed  for  additive  noise.  Examples  are 
the  Wiener  filter  [1]  and  statistical  estimators  [2]  for 


the  above  noise  model.  These  techniques  yield  bet¬ 
ter  performance  than  those  assuming  a  false  signal- 
independent  noise.  However,  the  above  methods  as¬ 
sume  the  parameter  k  is  known  a  prior.  Moreover,  due 
to  the  nonlinearity  in  the  noise  model,  expressions  for 
the  statistical  estimators  (MMSE  and  MAP)  have  a 
complicated  form  even  in  the  special  case  of  Gaussian 
image  s,  and  involve  solving  a  polynomial  equation  and 
numerical  integration  at  every  pixel. 

In  this  paper,  reliable  estimation  of  noise  model  pa¬ 
rameter  using  higher-order  statistics  (H.O.S.)  is  consid¬ 
ered.  In  this  respect  a  new  filter  (Wiener  type)  based 
on  H.O.S.  is  proposed.  Also,  realistic  film  grain  noise 
generation,  which  has  applications  in  television  and 
motion  picture  productions,  becomes  possible.  Because 
measurement  noise  is  Gaussian,  higher-order  statis¬ 
tics  of  the  observed  image  r  would  contain  contri¬ 
butions  from  the  non-Gaussian  image  s  and  signal- 
dependent  noise  only,  which  leads  to  better  parame¬ 
ter  estimation.  Furthermore,  since  photographic  im¬ 
ages  are  highly  non-Gaussian  and  film  grain  noise  is 
nonlinearly  related  to  the  original  image,  a  lot  more 
information  can  be  extracted  from  their  higher-order 
statistics  [3].  Filtering  schemes  based  on  H.O.S.  can 
give  better  performance, 

2.  Design  of  Higher-Order  Statistics 
Based  Filter 

Assuming  the  proposed  filter  h{x,  y)  be  a  finite  im¬ 
pulse  response  (FIR)  filter  with  a  support  region  of 

^(^5  2/)  0  «<»<&,  c  <  y  <  d,  (2) 

the  filter  coefficients  h{x^y)  can  be  solved  by  minimiz¬ 
ing  a  higher-order  statistics  criterion  that  is  an  exten¬ 
sion  of  the  mean  square  error  (MSE)  criterion  used  in 
the  correlation  based  Wiener  filter.  Let  the  error  signal 
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e{x^y)  be  defined  as: 

6  d 

e{x,y)  =  s{x,  y)  -  X)  X) 

»=a  j—c 

the  proposed  filter  h{x,y)  is  designed  by  minimizing 
the  following  criterion  [4]: 

J^{h)  - 

oo  oo 

X^  Cum^i  ^xy  5  ^xyi  ^x-a,y-^5  •  •  •  5 
a=— oo  /3=-oo 

r,_«,y-/5)}'  >  (4) 

with  Cxy  =  €(a:,  y)  and  h(aj,  y)  being  the  optimum  filter 
based  on  the  cumulant  based  criterion  Jq  . 

3.  Cumulant-Based  Wiener-Hopf  Equa¬ 
tion 

To  compute  the  optimum  filter  coefficients,  we  can 
extend  the  correlation  based  Wiener-Hopf  equation  and 
the  orthogonality  principle  to  higher-order  statistics. 
By  using  the  idea  described  in  [4],  let  h{x^y)  be  the 
filter  satisfying  the  following  cumulant  based  orthogo¬ 
nality  condition: 

oo  oo 

X.  X  Cum^(exy,ra:_,-,y_j, 
a=  — OO  oo 

rx-a,y-/3,  •  •  •  5  r^-a,y-^)  =  0  (5) 

Then  it  can  be  shown  that  h  is  the  optimum  filter  asso¬ 
ciated  with  the  criterion  .  To  derive  the  cumulant 
based  Wiener-Hopf  equation,  we  start  with  the  orthog¬ 
onality  condition  (Eq.  (5)),  and  substitute  expression 
for  e(ar,  y)  into  Eq.  (5)  to  give: 

b  d 

^  Csr{P,  9)  =  X  ^  ~  9  ~ 

i=a  j=c 

where  Cgr  (^rr  ^^e  defined  as 

00  00 

Csr{hj)—  Cum^  {Sxy^rx^i.y^j^ 

a=— 00  /?=— 00 

rx^a.y-^^  •  •  •  j  'f'x-ayy-  0)  (7) 

OO  00 

C^rrihj)  =  53 

«=  — 00  ^=  —  00 

^a:^a,y— •  •  •  j  a,y— (^) 


By  substituting  estimates  of  Crr  and  Csr  and  forming 
a  linear  system  of  equations,  filter  coefficients  h{i,j) 
can  be  solved.  Note  that  the  above  derivation  can  be 
applied  similarly  to  moments,  i.e.,  moment  based  crite¬ 
rion  leading  to  moment  based  orthogonality  condition 
and  moment  based  Wiener-Hopf  equation. 

4*  Estimation  of  Higher-Order  Statistics 

The  use  of  Eq.  (6)  requires  that  Csr  and  Crr  are 
known.  In  practice,  higher-order  cumulants  are  esti¬ 
mated  by  replacing  the  expectation  operator  by  sam¬ 
ple  averaging  over  the  data.  Estimating  Crr  is  not 
difficult  since  we  have  access  to  the  observed  signal. 
However,  Csr  is  not  easy  to  obtain  unless  we  know  the 
signal  exactly,  which  is  impossible.  Thus  we  need  to 
determine  their  relationships  with  the  original  signal 
statistics  which  is  assumed  known. 

By  substituting  the  noise  model  (Eq.  (1))  into  the 
definitions  of  Crr  and  Csr^  we  have  for  M  =  3  and 
p=0.5: 

Crrihj)  =  C,r{i,j)  +  k^iR,{i,j)  “  m^s) 

+k^'^'^iR,ia,l3)  -  m‘^,)S{i,j)  (9) 

a  p 

and 

C,r{i,j)  =  '^'^Cum{s^y,S^-i,y-j,S^-a,y-/3) 

a  p 

+k^iR,ii,j)  -m?) 

+JB[s(a:,  y)  -  m,]  X  X  i  ~  P)  (1®) 

a  /? 

where  Rg  and  rUs  are  the  autocorrelation  and  the  mean 
of  the  signal  s  respectively: 

R,{i,j)  =  f;[s(a;,  y)s(a;  -  i,y-j)]  (H) 

5.  Parameter  Estimation 

The  calculations  of  Eqs.  (9)  and  (10)  require  that 
the  constant  k  be  known.  However,  when  this  infor¬ 
mation  is  not  available,  we  must  estimate  the  constant 
from  the  observed  image  statistics  and  the  a  prior  ideal 
image  statistics.  In  the  case  of  p  =  0.5,  the  variance, 
skewness,  and  kurtosis  of  the  received  image  are  related 
to  that  of  the  original  image  by  the  following  equations: 

<Tf  =  <T^,-i-k^E[s]  +  <rl  (12) 
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cl  =  4  +  3*^2  (13) 

4  =  c\  +  6k^cl  —  \hk^(T^,  (14) 

The  value  of  k  can  then  be  solved  by  substituting  the 
statistics  of  the  observed  image  (which  can  be  esti¬ 
mated)  and  the  a  prior  image  statistics  and 

cf)  into  any  of  the  above  equations.  Note  from  the 
above  equations  that  the  use  of  higher-order  statistics 
in  the  presence  of  Gaussian  mesisurement  noise  leads  to 
better  estimation  of  fc,  as  cumulants  of  Gaussian  noise 
are  identically  zero. 

6.  Film  Grain  Noise  Generation 

As  outlined  in  Introduction,  film  grain  noise  gener¬ 
ation  has  applications  in  television  and  motion  picture 
productions  since  digitized  film  images,  video  images 
and  computer  generated  images  are  routinely  combined 
into  one  frame.  In  this  process  artificial  noise  is  added 
to  the  video  and  computer  generated  images  to  match 
the  grain  pattern  of  the  film.  To  generate  the  right 
amount  of  artificial  film  grain  noise,  the  noise  parame¬ 
ter  k  must  be  known. 

Using  the  above  method  to  estimate  the  noise  pa¬ 
rameter  k  would  require  statistics  of  the  corrupted  and 
ideal  images.  When  k  is  solved,  generate  and  add  a 
noise  image  to  the  computer  generated  image  accord¬ 
ing  to  Eq.  (1),  but  without  the  measurement  noise  w. 
In  cases  where  the  a  prior  ideal  image  statistics  are 
not  known,  an  approximate  solution  was  proposed  in 
[5]  which  uses  statistics  of  a  filtered  image  as  the  ideal 
image  statistics.  The  observed  image  r  is  processed  us¬ 
ing  a  sub-optimal  filtering  technique  that  requires  no 
knowledge  of  the  noise  statistics. 

7.  Simulation  Results 

In  this  section,  the  above  proposed  methods  in  esti¬ 
mation  of  the  parameter  Ar,  and  signal-dependent  film 
grain  noise  removal  based  on  H.O.S.  are  applied.  Two 
test  images  of  size  256  x  256  were  used:  Lenna  and 
Mountain.  Test  image  ’Mountain’  is  shown  in  Fig.  1. 

To  test  the  validity  of  parameter  estimation  of  Ar,  a 
number  of  simulations  were  performed  for  the  following 
two  cases: 

•  signal- dependent  noise  only 

•  mixture  of  signal-dependent  /  signal-independent 
noise 

Signal-dependent  film  grain  noise  and  Gaussian  mea¬ 
surement  noise  are  added  to  the  image  ’Lenna’.  Sample 


Figure  1.  Test  image:  ’Mountain’. 


cumulants  are  calculated  using  the  following  relation¬ 
ships: 

c\  ^  m\  (15) 

C2  =  (16) 

C3  =  m3  —  3mim2  +  2{rn\Y  (17) 

C4  =  m4— 4mjm3“3(m2)^+12(m2)^m2“6(mJ)^  (18) 

The  quantities  mj,  mj,  m3,  and  are  estimated  from 
an  M  X  image  using  sample  averaging.  The  signal 
statistics  C3,  and  C4  of  the  image  ’Lenna’  are  known 
a  prior  and  are  used  to  solve  for  Ar.  The  parameter  p 
was  fixed  to  be  0.5  throughout  the  experiments  since 
this  is  typical  for  a  variety  of  film  stocks.  Because  the 
variance  of  the  signal-independent  noise  is  assumed  un¬ 
known,  k  is  solved  using  Eq.  (12)  with  zero  measure¬ 
ment  noise  variance.  A  value  of  Ar  =  0.1  was  selected 
for  moderate  noise  corruption.  The  value  of  k  deter¬ 
mines  the  degree  of  degradation,  as  can  be  seen  from 
the  variance  of  the  signal-dependent  noise  term: 

(19) 

Fifty  independent  runs  were  performed,  with  the  re¬ 
sults  summarized  below. 


Table  1.  Estimation  of  k  with  true  value  k==0.1. 


estimated  k  (mezuii  standard  deviation):  | 

mam 

2nd  order 

3rd  order 

4th  order 

0 

0.0999ib0.0015 

0.0999±0.0024 

0.09984:0.0033 

0.1299i:0.0014 

0.1939±0.0013 

0.2681±0.0015 

p 

0 

0 

4^ 

H- 

0 

0 

0 

00 

P 

0 

0 

H- 

0 

0 

0 

00 

fcO 

0.3465ib0.0014 

0.0990±0.0096 

0.0984±0.0159 

The  advantage  of  using  H.O.S.  in  estimation  is  evi¬ 
dent  from  the  tables.  Second-order  statistics  results  in 
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high  bias  and  low  variance,  whereas  estimation  using 
H.O.S.  has  low  bias  and  higher  variance.  It  is  clear  that 
estimating  k  using  H.O.S.  is  better  than  using  second- 
order  statistics  in  the  presence  of  Gaussian  measure¬ 
ment  noise. 

For  noise  filtering,  again  two  different  cases  were  in¬ 
vestigated: 

•  signal-dependent  noise  only 

•  mixture  of  signal-dependent /signal-independent 
noise 


we  have  for  a  mixture  of  noise: 

Csr  {  mixture  of  noise  ) 

^  Csr  (  fiim  grain  noise  only  )  (24) 

and  \ 

Crr  (  mixture  of  noise  j 

«  Crr  (  film  grain  noise  only  )  (25) 

thus  the  cumulant  based  criterion  cannot  not  rec¬ 
ognize  the  Gaussian  measurement  noise.  The  filter 
performed  as  if  only  film  grain  noise  is  present. 


A  value  of  A:  =  0.1  was  used.  For  a  mixture  of  noise, 
variance  of  measurement  noise  cr^  was  chosen  to  be 
0.005. 

The  criteria  used  in  evaluating  the  performance  were 
1)  signal- to-noise  ratio  (SNR),  2)  mean  absolute  error 
(MAE),  and  3)  mean  square  error  (MSE)  which  is  sim¬ 
ilar  to  SNR.  These  are  defined  below  for  an  image  of 
size  M  X  N: 


SNR=  lOlogio: 


Ef=o  Elo^2/)-*(*.y)r 


(20) 


MAE  = 


1 

MN 


M-lN-l 

x=0  t/=0 


(21) 


MSE  = 


1 

MN 


M-lN-l 


x=0  y=0 


(22) 


where  s  and  s  are  the  ideal  and  estimated  images,  re¬ 
spectively.  A  filter  size  of  3  x  3  was  chosen.  Wiener  fil¬ 
ter  designed  based  on  the  usual  correlation  based  crite¬ 
rion  [1]  (designated  M2)  and  the  two  proposed  higher- 
order  statistics  based  criteria  (third-order  moment  M3 
and  third-order  cumulant  C3)  are  compared. 

A  number  of  observations  can  be  made: 


m  On  average  higher-order  statistics  based  filters 
achieved  better  SNR  than  the  correlation  based 
filter  by  1  dB.  This  improvement  may  be  due  to 
the  fact  that  more  information  about  the  image 
statistics  was  utilized. 


•  Performance  of  cumulant  based  filter  depends 
heavily  on  the  properties  of  the  image.  For  third- 
order  cumulant,  if  the  distribution  of  an  image  is 
close  to  Gaussian  or  is  symmetric,  then  it  is  not 
appropriate  to  use  the  cumulant  based  filter.  Ta¬ 
ble  5  shows  the  statistics  of  the  two  images.  It 
can  be  observed  that  ’Mountain’  has  the  lowest 
skewness,  thus  third-order  cumulant  based  filter 
did  not  perform  well.  If  fourth-order  cumulant  is 
used,  the  filter  should  perform  well  in  the  case  of 
film  grain  noise  only,  as  indicated  in  Table  4. 

To  test  the  noise  generation  procedure,  the  image 
’Lenna’  was  used  for  noise  generation.  Corrupted  im¬ 
age  was  filtered  using  the  method  described  in  [5]. 
Then  k  was  computed  using  fourth-order  statistics  of 
the  two  images.  Although  k  can  be  solved  by  match¬ 
ing  their  variances,  it  was  found  that  variance  of  the 
filtered  image  is  lower  than  that  of  the  corrupted  image 
because  edges  are  blurred  to  some  extent.  Thus  using 
variance  to  obtain  k  would  lead  to  over-estimation,  and 
the  final  image  would  be  too  noisy.  To  compare  the 
noise  level  of  the  original  corrupted  and  the  final  im¬ 
age,  SNR  was  used.  For  the  original  corrupted  image 
the  signal  power  is  the  ideal  signal  power,  whereas  the 
signal  power  of  the  final  image  is  that  of  the  filtered 
image.  It  can  be  seen  from  Table  6  that  the  noise  level 
in  two  images  (noise-added  and  original  corrupted  im¬ 
ages)  are  about  the  same. 


8,  Conclusions 


•  The  cumulant  based  filter  performed  as  good  as 
the  higher-order  moment  and  correlation  based  fil¬ 
ters  in  the  case  of  signal-dependent  noise  only. 
However,  for  a  mixture  of  noise,  the  cumulant 
based  filter  is  not  suitable.  By  examining  the  cu¬ 
mulant  based  Wiener-Hopf  equation  in  Eq.  (6): 

b  d 

C»r(P,  ?)  =  ^  ]^  Kh  j)Crr  {P  “  9  “  })  (23) 

izza  jz=c 


This  paper  presents  a  H.O.S.  based  filter  for  filtering 
images  corrupted  by  signal-dependent  film  grain  noise. 
In  addition,  estimation  of  noise  parameter  using  H.O.S. 
is  proposed  and  successfully  applied  in  film  grain  noise 
generation.  Simulation  results  show  that  the  perfor¬ 
mance  of  this  filter  is  better  than  that  of  filter  based 
on  second-order  statistics,  and  parameter  estimation 
using  H.O.S.  is  more  reliable  in  the  presence  of  Gaus¬ 
sian  measurement  noise. 
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Table  2.  Test  image  ’Leima’  with  signal-dependent  noise 
only  (k=0.1). 


SNR  (dB) 

MAE 

MSE 

unfiltered 

16.7591 

4.6375e-2 

3.6820e-3 

M2 

19.8458 

3.1849e-2 

1.8089e-3 

Mz 

19.9902 

3.1212e-2 

1.7498e-3 

a 

20.1143 

3.0150e-2 

1.7005e-3 

Table  3.  Test  image  ’Lenna’  with  mixture  of 
signal-dependent  noise  (k=0.1)  and  measurement  noise 
_ =  0.005). _ 


SNR  (dB) 

MAE 

MSE 

unfiltered 

13.0270 

7.3938e-2 

8.6957e-3 

M2 

17.7796 

4.1604e-2 

2.9110e-3 

Mz 

17.8464 

4.1266e-2 

2.8666e-3 

Cz 

13.7575 

6.7819e-2 

7.3494e-3 

Table  4.  Test  image  ’Moimtain*  with  signal-dependent 


noise  only  (k=0.1). 


SNR  (dB) 

MAE 

MSE 

unfiltered 

15.9772 

4.5280e-2 

3.4005e-3 

M2 

20.9042 

2.5644e-2 

1.0936e-3 

Mz 

21.6888 

2.1905e-2 

9.1283e-4 

M4 

21.5968 

2.2600e-2 

9.3237e-4 

Cz 

20.7989 

2.3356e-2 

1.1204e-3 

C4 

21.7648 

2.2393e-2 

8.9398e-4 

Table  5.  Image  statistics  of  various  test  images. 


Image  Statistics  | 

Image 

mean  . 

variance 

skewness 

Lenna 

3.6363e-l 

4.2353e-2 

7.2266e-3 

Mountain 

3.3807e-l 

2.0373e-2 

1.0025e-3 

Table  6.  SNR  of  noise-generated  images  using  different 
image  statistics  (original  SNR  is  16.7620  dB). 


statistics 

SNR  (dB) 

second  order 

16.0929 

third  order 

15.8685 

fourth  order 

16.9721 

Figure  2.  Corrupted  ’Mountain’  with  Ar  =  0.1. 


Figure  3.  Filtered  ’Mountain’  using  M2  criterion. 
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Abstract 

In  this  article,  we  would  like  to  present  results 
obtained  on  industrial  data  with  source  separation 
techniques  in  instantaneous  mix.  One  introduces  three 
applications  developed  to  perform  the  monitoring  of 
Electricite  de  France  civil  works  and  power  plants. 

The  first  application  concerns  the  monitoring  of 
nuclear  power  plants.  Each  internal  component  generates 
specific  vibration  modes  and  “neutron  noise  is  a 
combinaison  of  all  modes  generated.  The  aim  of  this  study 
is  to  separate  such  independent  vibration  modes. 

The  second  application  concerns  the  dams 
supervision  :  it  consists  in  separating  the  various  types  of 
motion  of  a  dam  according  to  their  physical  origin. 

The  third  application  concerns  non  destructive  testing 
on  steam  generators  in  nuclear  power  plants.  The  aim  is 
to  reduce  the  flattening  noise.  The  classical  methods 
operate  only  when  a  noise  reference  is  available.  We 
propose  to  use  the  multi-sensor  approach  with  the  blind 
separation  methods  (the  noise  reference  is  not  necessary). 

Considering  the  specifications  of  the  signals,  we  get 
better  performance  using  a  two-order  statistics  algorithm 
than  a  higher-order  statistics  algorithm. 

1.  Monitoring  of  nuclear  power  plants 

I. 1  Introduction 

Nuclear  reactors  are  composed  of  many  elements 
which  vibrate  and  become  deformed.  An  efficient 
monitoring  could  be  achieved  if  we  are  able  to  separate 
the  vibration  signature  of  the  thermal  shield  from  that  of 
the  core  barrel.  These  signatures  are  characterized  by 
vibratory  modes  :  «  mode  1  »  (around  7.5  Hz)  related  to 
the  swinging  of  the  core  barrel  and  «  mode  2  »  (around 

I I . 5  Hz)  due  to  the  deformation  of  the  thermal  shield.  In 
many  cases,  however,  because  of  its  amplitude  or  its 
frequency  deviation  (it  can  shift  from  7.5  to  11.5  Hz), 
mode  1  totally  masks  mode  2. 


We  are  in  the  presence  here  of  several  gaussjan 
sources  vibratory  with  different,  and  therefore 
independent,  physical  origins.  We  have  four  sensors 
positioned  at  90°  angles  around  the  core  barrel.  Each 
delivers  a  theoretically  linear  combination  of  various 
vibratory  sources.  The  medium  of  propagation  here 
cannot  be  modeled,  due  to  its  complexity. 

1.2.  The  used  techniques 

Because  many  vibratory  diagnostic  methods  rely  on 
spectra,  the  objective  is  to  estimate  the  spectrum  of  each 
source. 

We  present  two  methods  :  interspectral  matrices  which 
enable  estimation  of  the  spectral  density  of  each  source 
and  the  SOBI  algorithm  (Second  Order  Blind 
Identification),  which  performs  a  blind  separation  of  the 
temporal  signatures  of  each  source. 

•  Interspectral  matrices  are  used  to  estimate  source 
spectra.  By  diagonalizing  the  matrix  at  one  frequency. 
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we  estimate  the  power  spectral  density  (PSD)  values 
of  the  sources  (equal  to  the  eigenvalues)  [1].  A  simple 
classification  of  eigenvalues  in  descending  order  does 
not  show  the  evolution  in  source  power.  It  is  not  able 
to  deal  with  the  crossing  of  sources  spectra. 

We  therefore  propose  an  alternative  method,  based 
on  phase  and  amplitude  continuity.  We  assume  that 
the  eigenvalues  and  the  signal  sub-space  of  each 
source  are  continuous  in  frequency.  We  then  use  an 
algorithm  which  minimizes  the  discontinuities  of  the 
amplitude  of  the  eigenvalues  and  the  eigenvector 
phase  to  find  the  ordered  values  of  the  sources  in 
frequency  [6]. 

•  To  separate  temporal  signals,  we  then  use  the  SOBI 
algorithm  [4].  The  method  exploits  the  time  coherence 
of  the  source  signals.  The  algorithm  is  based  on  a 
‘joint  diagonolisation’  of  correlation  matrices.  You 
will  find  further  information  in  [5]. 

1.3  Experimental  results 

These  two  methods  have  now  been  applied  to  real 
data.  The  first  (figure  2,  above)  gives  successful  results, 
and  the  PSDs  of  the  physical  phenomena  are  well 
estimated.  With  the  second  method,  the  SOBI  algorithm 
gives  a  good  estimation  of  real  signals.  The  PSDs  of  the 
separated  signals  are  consistent  with  those  found  using 
the  first  method  (figure  2, below). 

An  examination  of  PSDs  reveals  several  vibratory 
modes : 

•  the  two  highest  eigenvalues  show  a  major  contribution 
around  7.5  Hz  (core  barrel  mode  1),  and  are  related  to 
swinging  of  the  core  barrel ; 

•  the  two  lowest  represent  one  source  related  to  the 
thermal  shield  (mode  2,  at  11.5  Hz) ; 

•  an  interfering  source  which  most  often  presents  two 
modes:  one  around  8  Hz  and  the  second  around 
10  Hz. 

1.4  Conclusion 

Used  algorithms  are  sub-optimal  because  : 

•  for  the  first  one,  the  eigendecomposition  of  the 
interspectral  matrix  has  no  sole  solution.  The  solution 
is  known  with  exception  of  a  unitary  matrix  (rotation 
matrix).  In  our  case,  we  choose  a  classic  seismic  phase 
constraint  [8]  [9] :  we  decide  that  the  first  component 
of  all  eigenvectors  will  have  a  nil  phase.  But  such  an 
approach  limits  significantly  the  general  nature  of  this 
algorithm. 

•  the  second  algorithm  makes  the  hypothesis  hat  an 
instantaneous  linear  mixture,  even  though  the  mix  is 
certainly  convolutive. 


However,  for  gaussian  sources,  this  method  restitute 
mode  2  correctly,  and  eliminates  the  “parasite  modes” 
found  with  traditional  methods. 


values  of  the  vibratory  sources 
2.  Dams  supervision 
2.1.  Introduction 

In  spite  of  their  apparent  immobility,  dams  move  in 
working  conditions.  Civil  works  monitoring  allows  a  best 
estimation  on  their  condition.  And  so,  we  should 
distinguish  between  the  displacement  of  the  dam  resulting 
from  solicitations  and  these  which  denote  a  damage  of  the 
work. 

Dams  are  equipped  with  sensors  of  radial  and 
tangential  displacement,  flow  sensors,  strain  sensors  .... 
The  registered  signals  contain,  on  the  one  hand 
information  about  the  ageing  of  the  dam,  on  the  other 
hand  the  response  of  the  dam  due  to  external  solicitations. 
There  are  two  kinds  of  solicitations  : 

•  mechanical  solicitations  corresponding  to  the  upward 
thrust  generated  by  the  water  retained  ; 
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•  thermal  solicitations  due  to  temperature  changes. 
Our  study  is  devoted  to  displacement  measures. 


Figure  3 :  Contribution  to  displacement  of  a  dam 


We  suppose  that  the  displacement  of  a  dam  is  an 
additive  mix  of  the  responses  of  the  dam  to  these  three 
phenomena  (supposed  independent  and  non  gaussian)  to 
which  a  measure  noise  is  added.  The  measimes  of 
displacement  are  very  few  (800  samples  for  around  20 
years  in  the  best  of  cases)  and  their  sampling  is  non 
regular  (being  able  to  vary  from  3  to  20  days)  and  non 
evenly  distributed.  We  can  note  that  the  variations  of  both 
solicitations  are  very  different.  If  mechanical  solicitation 
varies  every  day,  the  thermal  solicitation,  for  its  part, 
presents  mainly  yearly  and  monthly  periods. 


Figure  4  :  Radial  displacement  from  pendulum 
2.2.  Methods  used 

One  uses  a  source  separation  algorithm  that  aims  to 
find  the  various  contributions  from  signals  observed  on  a 
network  of  sensors. 

As  sources  are  non  gaussian,  we  could  have  used  an 
algorithm  based  on  higher-order  statistics.  But  we  have 
very  few  data  records  at  our  disposal  (from  150  to  800 
from  case  to  case)  we  chose  the  Second  Order  Blind 


Identification  algorithm :  SOBI.  Actually,  a  good 
estimation  of  correlation  requires  less  samples  than 
estimation  of  higher  order  cumulants. 

We  would  like  to  insist  on  the  fact  that  the  hypothesis 
of  instantaneous  mix  allows  to  use  classic  sources 
separation  methods,  even  for  non  regularly  sampled 
signals.  SOBI  uses  the  property  of  independence  of 
sources  which  technically  finds  expression  in  a  diagonal 
structure  of  the  correlation  matrices.  Estimation  of 
correlation  matrices  is  certainly  biased,  but  their  diagonal 
structure  is  preserved,  so  that  the  algorithm  keeps  its 
efficiency. 

2.3.  Application  to  displacement  of  concrete  dam 

Using  five  sensors  (one  above  the  others  straight  up 
the  dam)  allows  to  extract  three  solicitations,  which 
constitutes  a  good  result. 


Everyday  temperature  measurements  and  of  the  level 
of  the  water  retained  in  reservoir  that  we  got  for  this  dam, 
will  allow  to  check  the  results  obtained. 


Figure  5  :  Decomposition  of  a  dam  displacement 
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Figure  5  presents  the  different  contributions 
comparatively  to  the  level  of  the  water  retained  and  the 
temperature. 

reference  a  :  displacement  due  to  ageing 

reference  b :  displacement  due  to  mechanical 
contribution  below  and  level  of  water  retained  above 

reference  c  :  displacement  due  to  thermal  contribution 
below  and  temperature  above 

These  curves  highlight  a  cause  and  effect  relationship 
between  constraints  (temperature  and  level  of  water)  and 
their  contributions  upon  the  dam  :  mechanical  and 
thermal  displacement.  Besides,  the  effect  of  time  (ageing) 
allows  to  follow  the  evolution  of  the  dam  when  it  moves 
downstream. 

3.  Testing  on  steam  generators 

This  application  concerns  the  non  destructive  testing 
on  steam  generators  of  nuclear  power  plants.  The  aim  is 
to  detect  the  appearance  of  cracks  superficially  and  in 
depth  on  the  tubes. 

Because  of  the  manufacturing  process,  data  records 
contain  mainly  noise.  One  proposes  to  use  an  array 
processing  approach  to  restore  the  signals. 

3.1.  Introduction 

Nuclear  power  plants  steam  generators  are  tested  by 
eddy  current  probes.  These  probes  are  sensitive  to  every 
unevenness  of  the  material  and  detect  the  potential 
presence  of  defects  that  corrupt  the  evenness  (of  the 
tube). 

Now  the  surface  of  these  tubes  (manufactured  by  cold¬ 
rolling)  comprises  undulations  (normal  unevenness) 
which  generate  a  so  called  flattening  noise  which  can 
perturb  the  monitoring,  hiding  defects.  It  is  essential  to 
eliminate  this  noise  in  order  to  make  the  monitoring 
possible. 

Several  monitoring  complex  measures  work 
simultaneously.  The  most  advanced  of  the  methods 
developed  up  to  now  -particularly  the  adaptative  filtering- 
are  based  on  the  utilization  of  noise  reference  [2].  In 
practice,  even  if  some  measures  contain  mustly  noise,  we 
have  no  noise  reference  alone  and  that  sets  limits  to  this 
class  of  methods  performance. 

One  proposes  to  use  here  sources  separation  methods. 
One  considers  that  several  types  of  unevenness  (defect, 
flattening  noise)  are  present  in  the  steam  generators  and 
are  mixed  in  data  record.  These  irregularities  can  be 
assimilated  to  sources  and  we  can  suppose  that  they  are 
statistically  independent  of  each  other.  The  array 
processing,  thanks  to  separation  methods,  allows  to 


separate  the  contributions  of  unevenness  and  the 
flattening  noise  in  the  monitoring  signal. 


3.2.  Separation  algorithm 

The  test  with  algorithms  based  on  higher-order 
statistics  [3]  [7]  and  on  two-order  [4]  was  successful. 

The  results  presented  in  the  following  paragraph  have 
been  obtained  with  Cardoso’s  algorithm  [7]. 

3.3  Experimental  results 

The  results  show  an  improvement  of  the  signal  to  noise 
ratio  (SNR).  Some  hidden  defects  thus  be  detected 
(fig,  7).  We  used  the  real  part  of  two  complex  measures 
(on  the  left  hand).  The  treatment  separates  the  effective 
signal  and  the  noise  (on  the  rigth  hand)....  The  defect, 
which  was  hidden  in  the  raw  signal  is  detected. 

To  sum  up,  it  is  possible  to  eliminate  the  flattening 
noise  by  sources  separation.  This  treatment  can  plainly 
improve  the  ratio  signal  to  noise  and  makes  the  small 
range  defects  visible. 
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Figure  7  :  Estimation  of  flattening  noise  and  defects 
4.  Conclusion  References 


Mixes  appearing  during  the  non  destructive  testing  of 
E.D.F.'s  civil  works  are  more  or  less  convolutive. 
However,  we  tried  to  apply  to  them  source  separation 
techniques  conceived  for  instantaneous  mixes.  The  test 
with  three  different  methods  yields  conclusive  results. 

The  SOBI  algorithm  was  more  suitable  for  the  data 
records  than  the  higher-order  statistics  methods.  Actually, 
the  sources  had  a  temporal  coherency  and  different 
spectra.  Furthermore,  SOBI  allows  to  treat  gaussian 
sources  (first  example  :  vibratory  mode  separation)  and 
short  signals  (a  few  hundred  points  for  the  dam 
displacement  measures).  It  keeps  a  good  reliability 
because  it  is  based  on  two  order  statistics.  The  higher- 
order  statistics  allow  to  enlarge  the  signal's  representation 
space.  We  find  the  same  idea  in  the  SOBI  algorithm, 
which  uses  the  temporal  dimension  (sources  coherence) 
to  enlarge  the  representation  space. 

At  the  present  time,  a  study  about  new  source 
separation  methods  suitable  for  convolutive  mixes  is  in 
progress.The  selected  methods  are  based  on  higher-order 
statistics.  We  plan  to  test  them  and  to  estimate  their 
performance  for  our  applications. 
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Abstract 

This  paper  explores  the  structure  of  time-frequency 
representations  (TFR)  in  a  discrete  time  setting.  We 
define  a  proper  TFR  to  be  a  function  of  the  signal  that 
has  natural  time-  and  frequency-shift  properties.  We 
then  derive  the  basic  structure  of  a  proper  TFR  and 
argue  for  a  quadratic  TFR  as  the  simplest  form  of  a 
proper  TFR. 


1.  Introduction 

Time-Frequency  Representations  (TFRs)  have  been 
the  subject  of  renewed  interest  in  recent  years.  The 
variety  of  techniques  which  have  been  studied  makes 
it  somewhat  difficult  to  provide  a  complete  classifica¬ 
tion.  It  is  apparent  that  the  largest  number  of  publi¬ 
cations  focuses  on  a  class  of  TFRs  which  are  quadratic 
(or  bilinear)  functions  of  the  signal,  commonly  called 
“the  Cohen  class”  [1].  Various  non-quadratic  tech¬ 
niques,  such  as  Cohen  class  TFRs  with  data  depen¬ 
dent  kernels  (which  make  the  TFR  non-quadratic)  [2], 
data-adaptive  representations  [3,  4],  and  positive  time- 
frequency  distributions  (TFDs)  [5j-[6]  have  also  been 
proposed. 

When  looking  at  this  variety  of  TFRs  the  question 
naturally  arises:  what  is  a  “proper”  time  firequency 
representation?  Or  should  we  accept  any  function  of 
time  and  frequency  constructed  from  the  signal,  as  a 
valid  TFR?  If  not,  then  what  are  the  properties  which 
a  function  should  have,  in  order  to  qualify  it  as  a  TFR, 
and  what  is  the  minimal  set  of  properties  required  to 
uniquely  determine  its  form?  These  questions  seem  to 
have  received  relatively  little  attention  in  the  literature. 

In  this  paper  we  attempt  to  answer  these  questions 
by  defining  a  proper  TFR  to  be  a  function  of  the  signal 
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which  has  natural  time-shift,  frequency  shift,  scaling 
and  non-negativity  properties.  We  are  able  to  show 
that  these  properties  impose  a  particular  structure  on 
any  proper  TFR.  In  effect,  we  are  able  to  derive  a  “rep¬ 
resentation  theorem”  for  proper  TFRs,  thereby  extend¬ 
ing  the  philosophy  of  [7]  from  time-invariant  spectrum 
analysis  to  time-varying  spectrum  analysis. 

The  organization  of  the  paper  is  as  follows.  We 
start  in  section  2  with  a  general  characterization  of 
proper  TFRs.  In  section  3  we  describe  the  time  and 
frequency  response  of  a  proper  TFR.  Section  4  presents 
the  quadratic  proper  TFR.  In  section  5  we  derive  the 
form  of  a  quadratic  proper  TFR  for  the  case  of  finite 
data  sequences. 

2.  The  General  Form  of  TFRs 

Let  s{t),t  e  I,  where  I  is  the  set  of  all  integers, 
be  a  complex  discrete  time  signal.  We  will  denote  by  s 
the  infinite  dimensional  vector  whose  elements  are  s{t). 
Without  loss  of  generality  we  may  view  this  signal  as 
samples  of  an  underlying  bandlimited  continuous-time 
signal,  sampled  at  1  second  intervals.  We  will  assume 
that  the  signal  is  sampled  at  a  sufficiently  high  rate  to 
avoid  aliasing 

Let  P{t,  w)  be  any  function  of  time  and  frequency, 
constructed  from  the  signal.  The  dependence  oiP{t,w) 
on  the  signal  is  not  shown  explicitly  for  notational  con¬ 
venience.  The  frequency  w  is  assumed  for  the  moment 
to  be  continuous  —  corresponding  to  the  Discrete-Time 
Fourier  Transform.  In  other  words,  P(f ,  w)  is  defined 
on  a  discrete  time  grid,  but  a  continuous  frequency 
grid,  —'K<w<‘K. 

We  will  say  that  a  function  P(t,  w)  is  a  proper  TFR, 
if  the  following  properties  hold,  for  any  signal  s(t). 

1.  Frequency-shift,  or  modulation  property  -  If  s{t) 
is  replaced  by  s{t)e^‘^°\  then  P{t,w)  becomes 
P{t,  w  —  Wo),  for  any  frequency  wq. 
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2.  Time-shift  property  -  if  s{t)  is  replaced  by  s{t  — 
to),  where  to  is  an  arbitrary  integer,  then  P{t,w) 
becomes  P{t  —  to,  w),  for  any  delay  to* 

3.  Scaling  property  -  if  s{t)  is  replaced  by  cs{t)  where 
c  is  an  complex  constant,  then  P{t,w)  becomes 
\c\‘^P{t,w). 

4.  Non-negativity  property  -  P(t,  w)  >  0. 

The  first  property  is  tied  directly  to  the  basic  no¬ 
tion  of  frequency,  and  could  be  used  as  an  alternative 
way  of  defining  frequency.  Without  this  property  the 
variable  w  would  not  have  the  usual  interpretation  of 
frequency.  This  is  the  “modulation-invariant”  property 
of  [7].  The  time-shift  property  is  essential  for  introduc¬ 
ing  time  dependence  into  the  TFR.  Shifting  the  signal 
must  translate  into  a  similar  shift  of  the  time- frequency 
representation  of  the  signal.  Otherwise  the  variable  t 
would  not  have  the  usual  meaning  of  time. 

Assumptions  1  and  2  are  fundamental  properties, 
and  it  seems  difficult  to  envision  any  reasonable  defini¬ 
tion  of  a  TFR  which  does  not  obey  them.  We  note  that 
these  properties  hold  for  most  of  the  TFRs  discussed  in 
the  literature.  However,  here  we  propose  to  use  them 
as  the  defining  properties  of  proper  TFRs,  rather  than 
derive  them  for  an  assumed  TFR  structure. 

Assumptions  3  and  4  are  necessary  for  P{t,  w)  to  be 
an  energy  function.  Energy  is,  of  course,  non-negative, 
and  is  the  “squared  value”  of  some  underlying  phys¬ 
ical  quantity  (voltage,  current,  velocity,  etc.).  Prop¬ 
erties  3  and  4  could  be  combined  into  a  single  state¬ 
ment  saying  that  any  energy  estimator  P{t,w)  must 
be  a  non-negative  function  of  the  signal  which  scales 
quadratically  under  linear  signal  scaling. 

Let  us  start  by  assuming  that  P{t,w)  is  an  arbitrary 
function  of  time  and  frequency,  constructed  from  the 
signal.  We  will  then  examine  the  consequences  of  re¬ 
quiring  that  this  function  have  the  properties  above. 
Thus,  let 

P{t,w)  =  g{t,w,s)  (1) 

where  ^  is  a  mapping  from  to  TZ. 

Now  let  us  follow  the  methodology  of  [7]  and  con¬ 
sider  what  happens  when  we  replace  s{t)  by 
Then  s  is  replaced  by  D{wo)s,  where  D{w)  is  the 
frequency  shift  operator.  A  matrix  representation  of 
D{w)  would  be  an  infinite  dimensional  diagonal  matrix 
whose  diagonal  elements  are  From  property  1 

it  follows  that 

g{t,  w-wo,s)==  g{t,  w,  D{wo)s)  (2) 

Setting  tt;  =  0  and  then  replacing  —wq  by  w  we  get 


g{t,  w,  s)  =  g{t,  0,  D{-w)s)  (3) 

Next  consider  what  happens  when  we  replace  s{t) 
hy  s{t  —  to)-  Then  s  is  replaced  by  Z~'^^s  where 
denotes  the  delay  operator.  The  unit  delay  operator 
shifts  $  one  sampling  unit.  Using  property  2  it 
follows  that 

g{t-to,w,s)  =  g{t,w,Z~^°s)  (4) 

Setting  t  =  0  and  then  replacing  —  by  ^  we  get 

w,  s)  =  5(0,  w,  (5) 

If  we  combine  multiplication  by  and  delay  by 
^0  it  is  straightforward  to  show  that 

g{t,  w,  s)  =  g{0, 0,  D{-w)Z^s)  (6) 

(using  time  shift  followed  by  frequency  shift),  and 

g{t,  w,  s)  =  g{0, 0,  Z*D{-w)s)  (7) 

(using  frequency  shift  followed  by  time  shift).  Note 
that  Z^  and  D{—w)  do  not  commute: 

Z^D{-~w)s  -  e-^^^D{-w)Z^s  (8) 

However,  by  property  3,  g{t,w,s)  is  invariant  to  uni¬ 
tary  scaling,  so  either  order  of  delay  and  modulation 
produces  the  same  result: 

g{t,  w,  s)  =  g{0, 0,  Z*D{-w)s)  =  p(0, 0,  D{-w)Z*$) 

(9) 

This  is  a  representation  theorem  for  the  most  general 
form  of  a  proper  TFR.  The  dependence  of  the  proper 
TFR  on  time,  frequency,  and  the  signal,  must  be  in  the 
particular  combination  appearing  above,  and  g{0, 0,  x) 
must  have  the  quadratic  scaling  property. 

To  simplify  the  notation  we  will  suppress  the  first 
two  arguments  of  g  from  now  on.  In  other  words, 
p(0, 0,  •)  will  be  replaced  by  ^(•),  and  therefore  the  rep¬ 
resentation  for  a  proper  TFR  is 

P{t,  w)  =  g{Z^D{-w)s)  =  g{Di-w)Z*s)  (10) 

where  g  satisfies  the  quadratic  scaling  property 

g{cx)  =  \cfgix)  (11) 

For  fixed  t  and  free  w,  the  TFR  P{t,  w)  is  a  frequency 
scan,  and  for  fixed  w  and  free  t  it  is  a  time  scan.  Each 
of  these  scans  has  a  natural  frequency  or  time-shift 
property. 
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3.  Time  and  Frequency  Responses  of  a 
Proper  TFR 

It  is  straightforward  to  prove  a  few  more  facts  with¬ 
out  specifying  the  function  g,  such  as 

1.  If  s  =  1  (where  1  is  an  infinite  vector  of  all 
I’s)  then  P{t,w)  is  a  function  of  w  only.  In 
other  words,  the  TFR  of  a  constant  is  not  time- 
dependent.  This  follows  immediately  from  the  fact 
that  =  1,  for  any  t,  in  which  case 


Pi{t,w)  =  g{D{-w)Z^l)  =  g{D{-w)l)  =  g{tpi-w)) 

(12) 

where  V’(«^)  is  the  infinite  DTFT  sequence  {e^’""}. 
This  might  reasonably  be  called  the  frequency  re¬ 
sponse  of  the  proper  TFR. 

2.  If  s  =  <5„  (where  5n  is  an  infinite  vector  of  all  zeros, 
except  for  a  1  at  the  n-th  location)  then  P{t,w)  is 
a  function  of  t  only.  In  other  words,  the  TFR  of 
an  impulse  is  not  frequency  dependent,  in  which 
case 

Ps{t,  w)  =  g{Z^D{-w)Sn)  =  giZ^Sn)  (13) 

where  Z^Sn  is  the  infinite  delta  sequence  6t+n- 
This  might  reasonably  be  called  the  temporal  re¬ 
sponse  of  the  proper  TFR. 

3.  If  s{t)  =  -  a  linear  chirp  -  then 

P{t,w)  =  P{0,w  -  2/?2t).  In  other  words,  the 
TFR  has  a  fixed  shape,  which  gets  shifted  along 
the  frequency  axis  at  the  instantaneous  frequency 
of  the  chirp. 

Property  3  raises  the  question  of  the  most  general 
signal  s  for  which  an  instantaneous  carrier  frequency 
may  be  defined  for  a  proper  TFR.  We  shall  say  /3(t)  is 
the  instantaneous  carrier  frequency  for  a  proper  TFR 
P{t,w)  if 

P(t,w)  =  P{0,w  - /3{t))  (14) 

That  is,  the  TFR  has  a  fixed  shape  which  propagates 
along  the  it,w)  line  w  =  I3{t).  The  requirement  on  s  is 

Z^D{-w)s  =  cZ°D{-{w-l3{t)))s-,  |c|  =  l  (15) 

or 

e-^'“'("+‘^s(n  -\-t)  =  ce-^'(“-^(‘»”s(n)  (16) 


It  is  just  a  few  steps  to  show  that  the  most  gen¬ 
eral  sequence  with  this  property  is  the  unit  modulus, 
quadratic  phase  sequence 

s{n)  =  (17) 

in  which  case  (3{t)  =  24>2t.  In  summary,  the  most  gen¬ 
eral  instantaneous  carrier  frequency  we  can  associate 
with  a  proper  TFR  is  a  linear  carrier  frequency  -  and 
it  is  just  the  instantaneous  frequency  of  a  linear  chirp! 

In  summary,  every  proper  TFR  has  the  representa¬ 
tion  of  equations  (10)  and  (11),  and  the  most  general 
carrier  frequency  that  can  be  determined  for  such  a 
TFR  is  the  linear  carrier  frequency  of  a  linear  chirp. 
No  other  signal  produces  a  proper  TFR  for  which  a 
carrier  frequency  may  be  determined. 

4.  Quadratic  TFRs 

To  get  more  useful  insights  it  is  necessary  to  specify 
the  functional  form  of  g.  To  do  so,  we  shall  assume 
that  g  is  quadratic  in  the  data.  That  is, 

g(x)  =  EE  x*(m)Q(m,n)x(n);  Q(m,n)  =  Q*(m,n) 

where  Q  is  an  infinite  dimensional  non-negative  definite 
kernel.  In  this  case  the  quadratic  proper  TFR  is 

P(t,w)  =  5^52s*(m  +  t)e^“""Q(m,n)e-J“"s(n-t-t) 

m  n  ^ 

(19) 

This  is  a  representation  theorem  for  the  most  general 
quadratic  proper  TFR. 

A  reasonable  way  to  normalize  P{t,  w)  is  to  require 
its  expected  value  to  be  1  when  s  is  stationary  unit 
variance  white  noise: 

E{Pit,  w)}  =  1  =  ^  Q{n,  n)  (20) 

n 

In  this  case  P(0, 0)  is  a  reasonable  definition  of  gen¬ 
eralized  energy  of  s: 

P{0,0)  =  EE  s*{m)Q{m,n)s{n)  (21) 

m  n 

The  slices  P{0,w)  and  P(t,0)  are,  respectively,  fre- 
quency  scans  and  time  scans: 

PM  =  ^5;s*(m)Q(m,n)s(n)e-^-("— )  (22) 
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p(t,o)=Y,Il  s*{m  +  t)Q{m,n)s{n  +  t)  (23) 

m  n 

One  can  go  further  to  constrain  Q  by  imposing  re¬ 
strictions  on  marginals.  However,  there  is  no  choice 
of  Q  which  reproduces  both  |5(t)P  and  15(6^^) P  for 
its  marginals. 

Now  suppose  the  kernel  Q  is  the  rank-r  non-negative 
definite  kernel 


r 

Q{m,  n)  =  ^  fi{m)fi{n)  (24) 

where  the  sequences  fi^i  =  1, 2, . . . ,  r,  are  linearly  in¬ 
dependent  in  12{C)  and  normalized  so  that 

=  l  (25) 

i—1  n 

Then  the  quadratic  proper  TFR  is  the  following  mul¬ 
tiwindow  STFT 


P{t,  w)  =  ^\J2  Mn)sin  +  (26) 

2=1  n 

This  fixed- window  —  sliding-signal  version  of  the  TFR 
may  also  be  written  in  the  fixed-signal  —  sliding- 
window  and  its  filtering  forms: 


Pit,w)  = 

i—1  n 

=  (27) 

2=1  n 

(28) 

If  the  impulse  response  hi{t)  =  fi{-t)  is  finitely 
parametrized  as  an  ARMA  impulse  response,  then  the 
rank-r  multiwindow  is  finitely  computable,  for  each 
on  a  finite  lattice  of  the  Nyquist  band  (— 7r,7r].  The 
window  design  problem  for  quadratic  proper  TFRs  is 
to  design  normalized  windows  fi  which  concentrate  en¬ 
ergy  in  (t,w)  cells  of  shape  [-a,  a]  x  [-/37r,/37r],  where 
2q;  is  the  desired  temporal  resolution  and  2/3  the  de¬ 
rived  frequency  resolution. 

If  the  windows  fi  are  orthonormal,  i.e.  (/i,  fj)  =  6ij^ 
then  the  TFR  may  be  written  as 

P(i,  w)  =  ^\PfZ^D{-w)sf  (29) 

where  Pf  is  a  projection  onto  the  rank-r  subspace  of 
I2  spanned  by  the  windows  fi.  This  makes  the  most 


general  rank-r,  quadratic,  proper  TFR,  a  matched  sub¬ 
space  energy  detector  [9] . 

We  note  that  some  of  the  TFRs  proposed  in  the  lit¬ 
erature  employ  kernels  Q  which  are  signal  dependent, 
seee.g.  [2].  Strictly  speaking,  such  TFRs  do  not  belong 
to  the  class  of  quadratic  TFRs,  since  they  are  compli¬ 
cated  non-quadratic  functions  of  the  signal.  Further¬ 
more,  these  TFRs  are  proper  TFRs  only  if  the  kernel 
depends  on  the  signal  in  the  particular  functional  form 
discussed  earlier,  i.e.,  Q{Z^D{—w)s)  or  Q{D{-w)Z^s), 
The  signal  dependent  kernels  proposed  in  the  literature 
seem  to  all  have  this  property. 


5.  TFRs  for  Finite  Sequences 


In  many  situations  the  signals  for  which  we  wish  to 
compute  TFRs  will  be  of  finite  extent,  i.e.  s{t)  will 
be  non-zero  only  for  t  =  0, . . . ,  iV  -  1.  In  this  section 
we  will  specialize  the  quadratic  TFRs  for  infinite  data 
sequences  to  the  case  of  finite  sequences. 

First  note  that  in  the  finite  data  case  P(^,  w)  needs 
to  be  computed  only  for  i  =  0, . . . ,  AT  —  1.  In  other 
words,  it  is  not  necessary  to  consider  more  time-shifts 
than  there  are  data  samples.  In  order  to  do  this  prop¬ 
erly  we  will  define  the  signal  vector  as  the  signal  sam¬ 
ples,  preceded  hy  N  —  1  zeros, 


[0^_^,s{0),...,s(N-l)f  (30) 

N—1  zeros 


This  will  “leave  room”  for  the  necessary  number  of 
shifts.  This  zero-padded  vector  can  be  written  more 
compactly  as  Fs,  where  s  =  [s(0),  s(l), . . . ,  s{N  - 1)]^. 
That  is, 


F  = 


0 

In 


(31) 


where  0  is  an  (AT  —  1)  x  iV  matrix  of  zeros,  is  the 
NxN  identity  matrix,  and  Fs  is  a  (2N  —  1)  x  1  vector. 
In  the  following  we  will  denote  the  length  of  Fs  by  M, 
i.e.  M  =  2N-1. 

When  dealing  with  finite  sequences  it  is  convenient 
to  use  the  circular  delay  operator,  which  will  be  defined 
by  Z~^  -  the  circular  down-shift  matrix, 


Z"^ 


0  •••  0  1  ■ 

1  •••  0  0 


0  •••  1  0 


(32) 


The  circular  “advance”  operator  is  given  by  Z  -  the 
circular  up-shift  matrix,  which  is  the  transpose  of  Z“^, 


Z  =  (Z-i)-i  =  (Z-y  (33) 
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We  define  the  modulation  matrix 


References 


D(ti;)  —  diag{^(u;)}  (34) 

where  'ip{w)  is  the  DTFT  vector, 

i^iw)  =  [1,  (35) 

Using  this  notation,  it  is  straightforward  to  ver** 
ify  that  the  quadratic  TFR  introduced  earlier  can  be 
rewritten  as 

P{t,  w)  =  s"F^Z-‘D(w)QD(-u;)Z‘Fs  (36) 

or  equivalently 

P(t,  w)  =  s^F^D(u;)Z-'QZ*D(-^)Fs  (37) 

where  t  =  0, . . . ,  —  1.  It  is  easy  to  show  that  P(t,w) 

has  the  required  properties  for  a  proper  TFR.  The  ma¬ 
trix  Q  >  0  is  an  M  X  M  block  of  the  infinite  dimen¬ 
sional  kernel  Q  defined  earlier.  Equations  (36)  and  (37) 
provide  a  representation  theorem  for  quadratic  proper 
TFRs,  for  the  finite  data  case. 

Note  that  because  of  the  presence  of  AT  —  1  zeros 
in  Fs,  there  are  parts  of  Q  which  never  enter  in  the 
computation  of  P{t,w).  Specifically,  only  the  values 
of  Q  which  are  in  the  band  of  width  M,  between  the 
A/'-th  upper  diagonal  and  AT-th  lower  diagonal,  enter 
the  calculations.  Values  outside  this  band  could  be 
arbitrary. 

In  [10]  we  relate  the  form  of  the  TFR  in  the  equation 
above  to  the  multi- windowed  short-time  Fourier  trans¬ 
form  (STFT),  and  also  present  an  alternative  form  of 
the  quadratic  proper  TFR. 

6.  Conclusions 

Starting  with  the  requirement  that  any  “proper” 
TFR  must  obey  natural  time-  and  frequency-shift 
properties,  have  a  quadratic  scaling  property,  and  be 
non-negative,  we  have  derived  the  structure  of  such 
time-frequency  representations  in  equation  (10)  and 
(11).  We  have  shown  that  any  quadratic  proper  TFR 
must  have  the  form  specified  in  equation  (19).  In  the 
finite  data  case  the  quadratic  proper  TFR  must  have 
the  form  of  equations  (36)  or  (37).  When  the  kernel 
of  the  TFR  is  of  rank  r,  the  TFR  is  just  a  windowed 
STFT. 

The  main  point  we  have  tried  to  make  is  that  the 
TFR  structure  is  a  direct  consequence  of  a  simple  set  of 
defining  properties.  Thus,  it  is  not  necessary  to  intro¬ 
duce  various  auxiliary  notions  such  as  time-frequency 
distributions  obeying  certain  marginal  properties,  in 
order  to  come  up  with  quadratic  TFRs. 
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ABSTRACT 

Parameter  estimation  for  a  combination  of  a  polynomial 
phase  signal  (PPS)  and  a  frequency  modulated  (FM)  sig¬ 
nal  is  addressed.  A  novel  approach  is  proposed  that  al¬ 
lows  one  to  decouple  estimation  of  the  FM  parameters  from 
that  of  the  PPS  parameters,  exploiting  the  properties  of  the 
multi-lag  High- order  Ambiguity  Function  (ml-HAF).  Per¬ 
formance  analysis  is  carried  out  and  Cramer-Rao  bounds 
are  compared  with  simulation  results. 


1.  PROBLEM  STATEMENT 

The  discrete-time  model  of  the  hybrid  FM-PPS  signal  of 
interest  is; 

s(n)  =  p  eJ27r6sin(a,on+.#.o)  (1) 

where:  {ai}^o  PPS  parameters,  p  denotes  a  con¬ 

stant  (perhaps  unknown)  amplitude,  and  M  is  the  PPS 
order;  constant  6  >  0  is  the  so-called  modulation  index,  wo, 
</>o  stand  for  the  frequency  and  initial  phase,  respectively. 

This  paper  concerns  parameter  estimation  of  the  hybrid 
FM-PPS  signal  in  (1)  from  a  finite  number  of  N  noisy  ob¬ 
servations,  {a;(n)  =  s{n)  +  t;(n)}^J‘o^.  The  additive  noise 
t;(n)  is  assumed  stationary,  complex  white  Gaussian,  mix¬ 
ing  in  the  sense  of  [4,  p.  25],  with  zero-mean  and  variance 
al.  Signals  described  by  (1)  are  encountered  in  many  engi¬ 
neering  applications  such  as  radar,  SAR,  sonar,  acoustics, 
and  optics.  When  the  target  is  moving,  the  received  sig¬ 
nal  can  be  modeled  as  a  discrete-time  PPS  (see  e.g.,  [10]), 
and  the  polynomial  coefficients  {oi}  are  related  to  the  kine¬ 
matics  of  the  moving  target  [8,  p.  403].  The  High-order 
Instantaneous  Moment  (HIM)  and  its  Fourier  transform, 
the  High-order  Ambiguity  Function  (HAF),  introduced  by 
Peleg  and  Porat  (see  [8,  Ch.  12]),  have  provided  a  sim¬ 
ple  albeit  sub-optimum  algorithm  for  estimating  the  PPS 
coefficients  recursively. 

Signals  msing  from  moving  targets  however,  cannot  al¬ 
ways  be  modeled  as  a  pure  PPS.  For  example,  sinusoidal 
FM  signals  arise  from  vibrating  targets  [5],  [12],  or,  rotat¬ 
ing  parts  of  the  target  [3],  [7].  When  both  of  these  effects 
(motion  and  vibration/rotation)  are  present,  the  noise-free 
backscattered  signed  obeys  (1),  and  the  estimation  algo¬ 
rithms  proposed  in  [5]  and  [11]  for  a  pure  sinusoidal  FM 
are  no  more  appropriate. 

The  main  contribution  of  this  paper  is  to  show  that  the 
HAF  offers  a  good  alternative  to  the  computationally  in¬ 
tensive  maximum  likelihood  (ML)  approach,  even  when  the 
observed  data  cannot  be  modeled  as  a  pure  PPS,  but  a 
sinusoidal  FM  component  is  also  present. 

The  multi-lag  HIM  is  a  nonlinear  transformation,  origi¬ 
nally  introduced  for  continuous- time  signals  in  [l],  defined 


recursively  as  [2]: 

ri(n)  =  r(n),  a;2(n;  ri)  =  a;i(n  +  ri)r5'(n  —  n),  ..., 
XM{n;  Ti, . . . ,  tm-i)  =  XM-i(n  +  tm-T,  ti,  , . . ,  tm-z) 

X  XM-i{n  —  . . .  ,tm-2)  •  (2) 

We  term  xm  the  multi-lag  HIM  (ml-HIM)  because  it  re¬ 
duces  to  the  HIM  for  ri  =  . . .  =  tm-i  =  t,  [8,  Ch.  12], 
[10].  The  ml-HAF  is  defined  as  the  (generalized)  Fourier 
Series  of  the  ml-HIM: 

XMia;T):=  lim  ^  n , . . . ,  ,  (3) 

iV->ooiV  ' 
n=0 

where  r  :=  [ri  T2  . . .  tm-i]* 


2.  ESTIMATION  ALGORITHMS 

Applying  the  ml-HIM  in  (2)  to  the  noise-free  FM-PPS  signal 
we  find: 


/  \  2^  ^  j 

SM{n;T)  =  p  e-^ 

[aJcn+V'c  +  /3  sin{uJon-\-il>Q)] 

y 

(4) 

where: 

M-l 

M-l 

LOc  2  TTQM  T~7n  j 

*—  2  TTClAf  — 1  '^m  ) 

(5) 

m=l 

M-l 

m=l 

P  :=  2^7r6  JJ  sin(woT7n 

m=  1 

0  ,  tpo  :=  <f>o  +  {M  -  1)|  . 

(6) 

Eq.  (4)  shows  that  the  M-th  order  ml-HIM  of  a  sinusoidal 
FM  signal  is  still  a  sinusoided  FM  signal,  with  the  same 
vibration  frequency  lvq  but  with  different  modulation  index, 
P,  that  is  proportional  to  the  original  one,  6. 

Being  (almost)  periodic  in  n,  SM{n\T)  of  (4)  can  be 
written  as  a  superposition  of  equally  spaced  harmonics 
{wc  i  kioo}  with  amplitudes  {Jk{0)}i  the  fcth-order  Bessel 
functions  of  the  first  kind,  [9,  p.  311]: 

oo 

SM{n-,T)=  P^^~"  J^(^y[(-c+fc..o)n+V-c+fc^o]  _  (7) 

k=—oo 

Because  Jk{l3)  vanishes  for  large  |A;|,  [9],  there  exists  a  large 
enough  integer  K  such  that  most  of  the  energy  in  {Jk{P)} 
is  contained  in  the  range  k  6  [—K,  K].  It  has  been  shown  in 

[11]  that  the  smallest  integer  K  for  which  Jk{P)  > 

0.99,  is  A"  «  /9+1  for  /?  >  1,  whereas  A"  =  0  for  /?  6  [0, 0.14). 
For  (3  G  [0.14, 1]  K  is  either  1  or  2.  Depending  on  we  can 
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thus  approximate  the  infinite  sum  in  (7)  by  the  partial  sum 

which  consists  of  multi- component  constant  amplitude 
tones  with  harmonic  frequencies.  Fourier  transforming  (8), 
we  obtain  the  ml-HAF  as 

K 

Suia-,  t)  =  -  We  -  , 

k=:^K 

(9) 

where  ^(•)  denotes  the  Kronecker  delta  function. 

The  ml-HAF  in  (9)  peaks  at  {u;c  +  ^wo}fc=_iC-  Hence, 
the  vibration  frequency  wo  is  given  by  the  ^stance  between 
successive  peaks  of  |5M(ct;'r)|.  The  position  of  the  cen- 
tr^  peak  is  related  to  the  highest  order  PPS  coefficient  qm 
through  Wc  [c.f.  (5)].  Then,  making  use  of  (9)  one  can  ob- 
tcun  P  and  ^o,  and  subsequently  the  modulation  index  h 
and  the  phase  ^o,  from  the  values  at  the  peaks. 

Sample  estimates  of  SM{ot\T)  are  computed  from 

{®(«)}n=o'  as 


N-\ 

5M(a;r)  =  i  (10) 

n=0 

where  XM{n)T)  is  the  ml-HIM,  and  5M(cv;r)  =  XM{a\T) 
is  the  ml-HAF  of  the  data.  Under  the  stated  assumptions  on 
t;(n),  the  estimator  in  (10)  is  asymptotically  unbiased  and 
consistent  in  the  mean  square  sense  (see  [10]  for  a  proof). 

Once  Sm(q';t)  is  computed,  the  parameters  of  interest  can 

be  estimated  by  substituting  Suioi^r)  for  Sm{cx\t)  in  (9). 

We  can  take  advantage  of  the  multiple  lags  and  calcu¬ 
late  Xm  for  L  different  sets  of  lags,  n  ...  tm-i,/], 

;  =  1, 2, . . . ,  L,  having  the  same  product;  i.e.,  HmiTi  = 

V  /i,  /2  €  [1,  L\  The  product  of  L  ml-HAF  am¬ 
plitudes  \XM{a,Ti)\  enhances  the  desired  peaks  and 

reduces  noise  further  than  the  single  ml-HAF  [2]. 

The  resulting  algorithm  is  summarized  in  the  following 
steps. 

Step  1:  Calculate  from  the  data  a:M(u;  Ti),  the  ml-HIM  of 
order  M,  for  each  set  of  lags  r^,  /  =  1,  2, . . . ,  L. 

Step  2:  Estimate  the  ml-HAF  of  s(n)  using  (10). 

Step  3:  Find  the  peak  location  of  the  product  ml-HAF, 

nf=i  I^M(a,r()|  =  corresponding  to 

{wc  +  A;u;o}|L-a'  [see  also  (9)],  and  collect  these  values  in  a 
(2 AT  -h  1)  X  1  vector  p. 

Step  4:  Define  A  :=  [k  1],  a?  :=  [u;o  ,  and 

k:=[-K  -K  +  1  ...  K-1  K]'^, 

where  1  is  the  vector  of  ones.  Solve  the  overdetermined 
linear  system  AcD  =  p,  to  compute  Qo  2ind  cDc  as:  Q  = 
(A^A)"^A^p. 

Step  5:  Estimate  qm  as: 


Af-l 

=  .  (11) 

m=:l 

step  6:  Estimating  h  from  (10)  and  (6)  requires  some  care, 
because  (10)  is  exact  only  as  N  oo.  For  finite  sample 


size  we  find  from  (1)  and  (10)  that: 

^  qM-I  N  —  2  Tm,l 

5M(t^c  H-  kuo;ri)  =  p  - - 

X  +  noise  terms  .  (12) 

where  Pi  :=  2^nb  nf:/  sin(u;o'rm,0*  To  estimate  h  based 
on  (10),  we  first  find 


JkiPi)  : 


SMi^c  d-  kQo;Ti) 


and  estimate  Pi  as  (see  also  [11]): 

2s 


,  for  G  [-KJ<] , 
(13) 


Pi^ 


In 


,fc=-/C 


(14) 


Finally,  obtain  an  estimate  of  b  via  (6). 
step  7:  Define: 

^  :=:  [^0  :=  arg{5M(wc  +  A:wo;Ti)}, 

:=  ...  $Avr,$  ... 

and  the  (2M  +  1)L  x  2  dimensional  matrix  B  as; 


T  j^T 


B  :=  [A 

Solve  the  overdetermined  linear  system  B'0  =  ^  to  find 
^0  and  ipc  as:  *0  =  (B^B)”^B^^. 

Then,  $o  is  obtained  from  ^o-  Now,  having  estimated 
QM  and  the  FM  component  we  can  remove  it  from  the  data 
and  proceed  with  the  PPS  part  using  techniques  described 
in  [10]. 

As  an  example,  in  Fig.  la  we  plot  the  values  of  the  Bessel 
coefficients  Jk(P)  versus  A:,  for  6  =  6,  wo  =  0.0491,  n  =  129. 
From  (5j-(6)  we  find  c^c  =  0.5629  and  P  =  3.6996.  The 
values  of  the  other  parameters  can  be  foimd  in  the  Example 
2,  Section  4.  As  expected,  the  coefficients  become  negligible 
for  |A"|  >  5  in  (7),  confirming  that  AT  «  +  1.  In  Fig.  lb 

the  ml-HAF  X2(ct;ri)  is  plotted  for  a  G  [0,1],  SNR  := 
p2/cr2  =  16  dB,  and  N  =  1,024.  From  the  location  (and 
value)  of  the  peaks  we  derive  our  estimates. 
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Figure  1.  (a)  Bessel  coefficients  Jk(P)  vs.  A:,  for  P  = 

3.6996.  (b)  A2(a;Ti)  vs.  a,  for  SNR  =  16  dB  and  N  = 
1,024. 

Narrowband  FM.  According  to  Carson’s  rule  the  bcmd- 
width  of  an  FM  signal  is  approximately  2a;o(l  +  P)y  [9, 
p.  313].  We  will  refer  to  the  case  discussed  so  far, 
for  general  /3,  as  the  wideband  FM  (WBFM).  Expanding 
exp^73sin(a;on  +  ^o)]  in  Taylor  series  we  find: 


^>/3sin(u/on+V'o) 


E[jp  sin(u;on  +  ^o)]^ 

A;!  . 

k=0 


(15) 


94 


If  13  is  small  enough,  we  cein  truncate  the  series  expansion 
as  follows: 

^j^3in(a;on+V'o)  ^ 


vector  of  correlated  complex  Gaussian  observations  is  car¬ 
ried  out  in  [6,  Ch.  3],  and  in  our  case  it  produces  for  the 
(i,  j)th  entry: 


1  +  i/3sin(a;on  +  V'o)  -  -/3^  sin^(a;on  -j-  ^o) 


=  1  -  —  +  j/3sm(u;on  +  ^o)  -  ^  cos(2a;on  +  2^o). 


A  value  of  /?  <  1  is  sufficient  to  guarantee  that  the  second 
order  approximation  is  valid.  We  will  refer  to  the  case  of 
/3  <  1  as  narrowband  FM  (NBFM).  Using  this  approxma- 
tion  in  (8)  we  obtain: 


Jij  = 


2  fds^iO)  ds{9)] 

(tI  \  ddi  dOj  j  • 


(19) 


The  CRLBs  are  obtained  from  the  diagonal  elements  of 
which  we  denote  as  CRLB(0i)  =  Inserting  s(n)  of 

(1)  in  (19)  and  taking  partial  derivatives,  we  obtain  the 
elements  of  the  FIM.  Evaluation  of  Jij  is  not  difficult  but 
tedious.  Skipping  the  details,  we  found: 


*SM(a;r)«p^  -  Wc -f  2a;o) 

+  f  -  We  +  wo)  +  (1  -  f )  e^^^S{a  -  We) 

+  |e^<'^-=+’^»)j(a-We-wo) 

_  ^e^<^“+'’^“)(J(a-Wc-2wo)],  (16) 

which  m^es  estimation  of  fi  from  the  amplitude  of  the 
peaks  easier.  Note  that  the  bandwidth  of  the  signal  in  (16) 
is  4u;q  as  predicted  by  Carson’s  rule  (with  /3  =  1).  The 
estimation  algorithm  is  obtained  by  modifying  Step  6  as 
follows: 

Step  6  (NBFM):  Exploit  (15)  to  obtain  different  esti¬ 
mates  of  ^  as: 


Po,i 

ki 

1^2,1 


\SM{^c;ri)\ 


|5m(wc  +  wq;  rt)[  +  [5m(wc  —  wp;  r;)! 

P^^-^{N-2YZ:lrm.i)IN  ’ 


(17) 


2  /|5M(we  +  2wo;T()|  +  |5m(wc  —  2wo;Ti)| 

V  P^"-\N-2Y:Z:l-rr.,,)IN 


and  then  take  their  mean,  /9  =  1/(3L)  SLo 

nally,  b  is  derived  by  inserting  in  (6).  The  other  steps 
remain  unchanged.  It  is  worth  observing  that  because 

p  =  si^(^o7m),  it  is  possible  to  select  the  lags 

r  in  order  to  have  /9  <  1.  This  allows  estimation  of  p  using 
the  second-order  approximation  (16). 


= 

,!to!  <rl  ^  (n  j  .  0  <  h  m  <  M,(20) 

n=0 

Jb,b 

= 

<y  •y  ^  —  1 

Sd .2/  /  N 

o  >.sin  (a;on  +  (^o)  , 

Uv  ^ 

n=0 

(21) 

JtjjQ,U}Q 

= 

Sdp'^b^  f  n\‘^  2/  ,  X 

71=0 

(22) 

= 

Sdp^b^  V  2/  ,  X 

^2  >  COS  (ccJon -|-<?^o)  , 

Uy  '  ^ 

7T  —  0 

(23) 

Jai,b 

= 

Hal  ^{n)  ">'‘(‘"<>”  +  ^0). 

71=0 

(24) 

Jai,u}Q 

= 

Sd  p^b  f  n  ^ 

Hal  -Liwj  + 

71  =  0 

(25) 

'^OLi^O 

= 

Sd p^b  f  (  1  \ 

Hal  ^UJ  + 

71=0 

(26) 

JbjUjQ 

= 

al  ^  +  ’ 

71  =  0 

(27) 

Jb,(f>o 

= 

A  2  2-l 

47r  p  b  ^  1  X 

2  >  sin(2a;on  +  20o)  , 

(T  ■  • 

TX^^O 

(28) 

= 

o  o  o  ^  1 

sd  p^d  V-^  f  2/  /X 

al  2^[jj)^os{u,on  +  4>o). 

(29) 

n=0 


3.  PERFORMANCE  EVALUATION 

Consider  the  PPS-FM  model  in  (1)  and  denote  the  param¬ 
eter  vector  as  To  avoid  numerical  problems  in  the  inver¬ 
sion  of  the  Fisher  information  matrix,  it  is  useful  to  define 
0  as  follows: 

e  :=  [N°ao  . . .  N^gm  h  Nuo  4>of.  (18) 

If  0  is  an  unbiased  estimator  of  0,  then  it  must  satisfy  cov(0) 
>  J“^,  where  J  is  the  so-called  Fisher  information  matrix 
(FIM).  As  a  first  step  towards  deriving  the  CRLBs,  let  us 
write  the  observed  data  in  vector  form  as  x  =  s(0)-h  V, 
where  [x]n  :=  a;(n),  n  =  0, 1, . . . ,  AT  —  1;  similar  defini¬ 
tions  apply  to  s(0)  and  v.  Due  to  the  assumptions  on 
t/(n),  we  have  that  x  is  an  V  x  1  complex  Gaussian  circular 
vector,  with  mean  value  E{x}  =  s  and  covariance  matrix 
E{xx^}  =  (7^1,  where  ^  denotes  conjugate  transpose,  and 
I  is  an  A  X  W  identity  matrix.  Calculation  of  the  FIM  for  a 


Equations  (20)- (29)  allow  the  exact  numerical  evaluation 
of  the  FIM,  but  it  is  not  clear  how  the  parameters  affect 
the  bounds.  However,  a  Icirge  sample  approximation  can  be 
derived  by  regarding  the  sums  as  an  Euler  approximation  of 

the  integral,  J2n=o  •^(”)  ^  f{i)dt.  With  this  in  mind, 
we  obtain: 


Jb,b 

J(jdQ  jWQ 


(i  -h  m  d-  l)i!m!  d 


4d  {? 


,  0  <  i,  m  <  M, 


^  _  sin(2a;oA/'  +  2(^q)  -  sin(2^) 
2u;o 


4d  N 
d  3 


sin(2a;oAr  -j-  2^o) 

2iVQ 


_  4dp^b^  ^  sin(2a;oAr  2<^o)  —  sin(20o) 


d 


2c4^o 
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CRLB(a2)  CRLB(phiO)  ^  CRl^(b) 


^a.,: 


Jai  yU2Q 

'^ClO><Po 


Jh, 


UJO 


Jb,4>o 


1^0 


cos(ijjQN  +  <^o)  “  cos{<f>o) 

i!  (T? 

Wo 

cos(u;oiV  +  ^o)_ 

d  CTv 

Wo 

Sir^p^b 

sin  ( Wo  AT  +  0o) 

z!  (tI 

Wo 

p^b 

sin(woAr  -h  <^o)  —  sin(<^o) 

(xl 

> 

Wo 

= 

p^b 
i\  (T? 

sin(a;oAf  +  <Ao)^  1  <  i  <  M, 

Wo 

p^b 

cos(2wo  AT  -h  2<f)o) 

(tI 

2wo 

47r^  p^b 

cos(2wo  A/^  +  2(j>o)  —  cos  (200 

— 

2wo 

(tI 

sin(2wo  AT  +  20o) 

2wo 

In  Fig.  2  the  CRLBs  of  the  FM  and  PPS  parameters  are  re¬ 
ported  for  SNR  =  12  dBj  b  =  0.05,  u^o  =  0.015,  and  <j>o  =  0. 
The  solid  and  dashed  lines  refer  to  the  exact  and  approx¬ 
imate  CRLBs,  respectively.  The  dashdotted  lines  refer  to 
the  exact  CRLBs  when  only  FM  or  only  PPS  parameters 
are  present  (let  us  call  them  marginal  CRLBs).  As  we  see, 
asymptotic  behavior  is  reached  at  about  N  =  1000.  The 
effect  of  coupling  beetwen  the  PPS  and  the  FM  parameters 
is  evident  by  comparing  the  plots  with  solid  and  dashdot¬ 
ted  hnes.  The  results  show  that  for  N  >  1000  the  coupling 
tends  to  be  negligible. 


Figure  2.  Exact  (solid),  approximate  (dashed)  and 
marginal  (dashdotted)  CRLBs  vs.  N . 

To  illustrate  and  evaluate  the  various  parameter  estima¬ 
tion  algorithms  discussed  so  far,  we  simulated  numerically 
the  performance  by  Monte  Carlo  experiments  and  we  com¬ 
pared  them  with  the  CRLBs. 


Example  1:  Narrowband  FM  signal  and  3^^-order  PPS 
We  first  generated  N  =  2,048  samples  according  to  (1), 
with  parameters:  p  =  1,  6  =  0.05,  c<;o  =  0.015,  (^o  =  0, 
ao  =  0,  ai  =  0.25,  02  =  L3889  •  lO"",  03  -  1,3022  •  lO^^ 
The  noise  variance  al  was  set  to  obtain  the  desired  signal- 
to- noise  ratio,  defined  as  SNR  :=  The  sets  of  lags 

in  the  product  mi-HAF  were:  (n,/,  r2,i)  =  (60, 60),  (72, 50), 
(75,48),  (80,45),  (90,40),  and  (100,36),  so  that  n,/  •  T2,z  = 
3600,  for  /  =  !,•••,  6;  consequently  u^c  =  1.1782  and  /3i  = 
0.7711,  0.7555,  0.7476,  0.7319,  0.6923,  0.6445,  respectively. 
Note  that  all  of  them  are  less  than  one,  so  the  second-order 
approximation  is  valid.  All  FFT  operations  were  carried  out 
by  zero-padding  to  Nzp  =  2^^  points  so  that  the  frequency 
bins  were  small  enough  to  allow  accurate  estimation  of  the 
peaks’  location. 

In  Fig.  3  we  show  the  mean  square  errors  (mse)  of  the  es¬ 
timates  versus  SN R^  obtained  from  400  independent  Monte 
Carlo  runs  (in  each  run  we  generated  =  2, 048  samples). 
The  solid  lines  refer  to  the  approximate  CRLBs  {CRLB{u}o) 
and  CRLB{am)  were  properly  scaled  by  N^  and  re¬ 

spectively);  the  dashed  hnes  correspond  to  the  mse  obtained 
assuming  known  amplitude  p;  and  the  dashdotted  lines  re¬ 
fer  to  the  case  where  p  is  unknown  and  is  estimated  using 
the  following  algorithm  based  on  fourth  order  cumulants: 


P  = 


JV-l 


iV-l 


2 


1/4 


(30) 


Unbiasedness  and  mean  square  consistency  follow  from  the 
mixing  conditions  assumed  to  be  satisfied  for  t'(n);  the  proof 
is  standard  and  is  not  reported  here. 

In  the  range  of  SNRs  investigated,  no  sensible  difference 
was  observed,  between  the  two  cases  (known  p  and  unknown 

p),  except  for  the  mse  of  6.  We  also  note  that  the  mse  s  of 
03  and  02  are  not  very  close  to  the  bounds.  The  reason  is 
related  to  the  amount  of  zero-padding  [10].  Similar  results 
have  been  obtained  for  the  other  parameters,  but  are  not 
reported  here  due  to  space  hmitations.  Finally,  in  Fig.  4  we 
report  the  mse’s  and  the  exact  CRLBs  for  the  estimators 
of  6  and  wo  versus  N.  As  expected,  when  CRLB{u)o)  drops 
below  10“^°  the  mse  is  no  more  able  to  track  the  bound, 
due  to  the  limited  FFT  resolution. 


Example  2:  Wideband  FM  signal  and  2’^^-order  PPS. 
Here,  we  generated  N  =  1,024  samples  of  the  FM-PPS 
process  with  second-order  polynomial  phase,  and  param¬ 
eters  given  by:  6  =  6,  wo  =  0.0491,  <l>o  =  0,  oo  =  0.5, 
ai  =  0.1,  a2  =  3.4722  •  10“^  and  n  =  129,  so  Wc  =  0.5629 
and  /3  =  3.6996.  The  modulation  index  was  estimated  ac¬ 
cording  to  the  method  reported  in  (14),  proposed  in  [11]. 
Note  that  /3  >  1,  so  we  cannot  use  (17). 

Tables  I  and  II  show  biases  and  variances  of  the  estimates 
for  the  SNR  =  16  dB  (first  two  rows)  and  8  dB  (last  two 
rows),  obtained  from  400  independent  Monte  Carlo  runs 
(A/"  =  1, 024  samples  per  each  run).  As  we  see  from  Table  II, 
low  SNR  increases  the  variance  of  6.  Even  if  the  estimator 
(14)  guarantees  reliable  estimates  of  the  modulation  index 
tor  a  pure  FM  tone  observed  in  additive  white  Gaussian 
noise  (see  [11])?  when  it  is  applied  to  the  data 
^  the  cross-terms  between  s(n)  and  v{n),  present  m  the 
ml-HIM  of  a:(n),  contribute  to  the  disturbance  term,  which 
consequently  can  no  longer  be  considered  white  or  Gaus¬ 
sian  process.  This  seems  to  have  deleterious  effects  on  the 
performance  of  (14),  as  shown  in  Table  11. 

A  possible  remedy  is  to  use  the  estimate  wo  to  select  a 
new  set  of  lags  in  such  a  way  that  /3  <  1,  and  then  es¬ 
timate  h  as  in  Step  6.  We  repeated  the  simulations,  this 
time  with  n  =  256,  so  that  /9  =  0.2435,  and  we  obtained 
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SNR  (dB) 


Figure  3,  Mean  square  error  for  known  p  (dashed),  un¬ 
known  p  (dashdotted),  and  CRLBs  (solid)  vs.  SNR, 


Figure  4.  Mean  square  error  (dashed)  and  CRLBs  (solid) 
vs.  N. 

bias(6)  ==  0.1384,  var(6)  =  0.6407  for  SNR  =  16  dB,  and 

bias(6)  =  0.1469,  var(6)  =  3.0400  for  SNR  =  8  dB,  The 
accuracy  improvement  (at  least  for  the  low  SNR  case)  is 
now  noticeable. 


Table  I  -  PPS  parameters  {SNR  =  16  dB  and  8  dB) 
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di 

oo 

Bias 

-1.8236  ■  10“* 

0.0  174 

-0.2938 

Var 

2.327  • 

0.0028 

0.0376 

Bias 

-4.3700  ■  10“' 

-0.0226 

-0.4019 

Var 

5.6254  ■  lO-*"* 

0.0276 

0.0765 

Table  II  -  FM  parameters  {SNR  =  16  dB  and  8  dB) 
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Bias 
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0.0574 

Var 

0.0460 

2.7497  •  10-'"' 

4.7490  •  10“® 

Bias 

-4.1559 

-7.6639  •  lO-"* 

0.0549 

Var 

1.1172  •  10^ 

3.1892  •  10““ 

0.0106 

4.  CONCLUSIONS 

Parameter  estimation  of  a  class  of  nonstationary  complex 
signals  whose  phase  can  be  modeled  as  a  superposition  of 
a  polynomial  term  cuid  a  sinusoidal  frequency  modulated 


term  was  addressed.  Previously  proposed  methods  for  es¬ 
timating  the  FM  parameters  (^d  in  particular  the  mod¬ 
ulation  index)  do  not  include  PPS  components,  and,  vice- 
versa,  the  standard  approach  based  on  the  HAF  for  PPS 
parameter  estimation  cannot  work  when  the  FM  compo¬ 
nent  is  present.  The  proposed  method  handles  the  more 
general  scenario  of  hybrid  FM-PPS  signals  exploiting  the 
properties  of  the  multi-lag  HAF.  This  approach  allows  one 
to  decouple  estimation  of  the  FM  parameters  from  that  of 
the  PPS,  with  a  noticeable  reduction  in  computational  com¬ 
plexity.  The  redundancy  offered  by  the  multi-lags  was  also 
used  to  reduce  the  FM  term  to  a  narrowband  process,  and 
thereby  improves  the  accuracy  of  the  modulation  index  es¬ 
timation.  The  exact  Fisher  information  matrix  for  the  FM- 
PPS  parameters  was  also  derived  along  with  an  asymptotic 
form  of  the  CRLBs.  Computer  simulations  were  carried  out 
to  compare  the  performance  of  the  proposed  methods  with 
the  relevant  CRLBs.  The  model  adopted  herein  is  poten¬ 
tially  useful  for  practical  radar/sonar  modeling  and  target 
classification. 
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Abstract 

A  sparse  second-order  time-domain  Volterra  model  is 
used  to  decompose  a  random  ( sea)  wave  train  into  its  first- 
and  second-order  components.  Extreme  waves  are  shown  to 
result  from  short-term  phase  locking  of  the  first-  and  second- 
order  components.  The  feasibility  of  using  a  wavelet-based 
bicoherence  “spectrum  ”  to  detect  the  strong,  but  short  lived, 
phase  coupling  is  investigated.  The  results  are  encouraging 
and  suggest  the  wavelet-based  bicoherence  is  a  topic  worth 
considering  further. 


1.  Introduction 

Extreme  waves  are  very,  very  large  amplitude  sea  waves 
that  are  occasionally  generated  during  severe  storm  con¬ 
ditions  such  as  hurricanes.  They  are  of  immense  practi¬ 
cal  importance,  since  such  waves  are  capable  of  capsizing 
ships  and  offshore  structures  used  in  petroleum  production. 
Despite  their  great  importance,  our  understanding  of  such 
extreme  waves  is  quite  poor  for  two  basic  reasons.  First, 
they  are  relatively  rare,  and  secondly  they  involve  nonlin¬ 
ear  phenomena.  In  this  paper,  we  report  on  the  applica¬ 
tion  of  a  number  of  higher-order  statistics  based  techniques 
that  have  been  utilized  to  analyze  and  interpret  extreme 
wave  experimental  data.  We  make  use  of  both  Fourier-based 
and  wavelet-based  higher  order  spectra,  and  frequency-  and 
time-domain  Volterra  models,  to  provide  new  physical  in¬ 
sight  into  important  nonlinear  wave  interaction  phenomena 
that  underlies  extreme  wave  generation.  Such  experimental 
insight  has  heretofore  been  unavailable. 

Time  series  data  of  instantaneous  wave  elevation  were 
collected  from  a  series  of  probes  separated  one  meter  apart 
in  the  direction  of  propagation.  The  experiments  were  car¬ 
ried  out  in  a  model  basin  as  described  in  Sec.  2.  The  gener¬ 
ated  random  waves  were  unidirectional  and  were  generated 
in  such  a  way  as  to  model  (scale  1:54.5)  a  100-year  Gulf  of 
Mexico  storm.  In  Sec.  4  we  review  the  use  of  higher-order 


spectra  to  determine  a  frequency-domain  Volterra  model 
which  models  the  first-order  (i.e.,  linear)  and  second-order 
(i.e.,  quadratic)  physics  which  govern  wave  propagation  be¬ 
tween  the  two  pro^s.  Since  the  waves  at  any  point  are  non- 
Gaussian  due  to  prior  nonlinear  wave  interactions,  determi¬ 
nation  of  the  Volterra  kernels  must  take  this  into  account. 
Thus  the  wave  statistics  must  be  characterized  by  a  hierar¬ 
chy  of  higher-order  spectra  up  to  4th  order.  Furthermore,  the 
model  must  be  orthogonal  (here  we  make  use  of  a  modified 
Gram-Schmidt  procedure)  to  facilitate  the  decomposition  of 
the  experimentally  observed  wave  power  spectrum  into  its 
constituent  first-order  and  second-order  components.  Such 
decomposition  indicates  that  most  of  the  wave  power  is  as¬ 
sociated  with  first-order  (i.e.,  linear)  phenomena,  while  the 
power  associated  with  second-order  effects  is  considerably 
smaller  and  resides  in  a  higher  frequency  band. 

Next,  in  Sec.  5,  we  utilize  the  same  experimental  time  se¬ 
ries  data  to  determine  a  sparse  time-domain  Volterra  model 
in  order  to  gain  insight  in  the  temporal  relationship  be¬ 
tween  the  first-order  and  second-order  wave  components. 
These  components  correspond  to  the  output  of  the  linear 
and  quadratic  Volterra  filters,  respectively.  The  results  indi¬ 
cate  a  second-order  component  oscillating  (the  wave  spec¬ 
trum  is  a  JONSWAP  spectrum  [1]  which  is  moderately  nar¬ 
row  band)  at  roughly  twice  the  frequency  of  the  first-order 
component.  However,  when  the  occasional  extreme  wave 
is  present,  the  relationship  between  first-  and  second-order 
components  changes  dramatically.  We  observe  in  this  case 
that  the  second-order  component  is  large  in  amplitude  and 
phase-locked  to  the  first-order  component  in  such  a  way  that 
constructive  interference  takes  place.  Such  constructive  in¬ 
terference  lasts  only  for  a  cycle  or  so  and  gives  rise  to  an  ex¬ 
treme  wave. 

Since  the  first-  and  second-order  components  are  phase- 
locked  over  a  relatively  short  period  of  time  and  since 
the  phase-locking  (or  phase  coupling)  is  relatively  weak 
over  most  of  the  remaining  experimental  time  series  record, 
Fourier-  transform-based  bicoherence  spectra  are  really  not 
suitable  for  detecting  such  short-lived  nonlinear  interac- 
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tions.  For  this  reason,  we  utilize  a  wavelet-based  bicoher¬ 
ence  spectra  in  Sec.  6  in  an  attempt  to  detect  the  short¬ 
lived  but  high  degree  of  phase  coupling  associated  with  the 
generation  of  extreme  waves.  The  wavelet-based  bicoher¬ 
ence  results  are  quite  encouraging  and  suggest  this  may  be  a 
powerful  tool  with  which  to  detect  and  quantify  short-time- 
duration  nonlinear  events.  The  paper  is  summarized  in  Sec. 
7. 

2.  Experimental  Setup 

The  experiments  were  conducted  at  the  Offshore  Tech¬ 
nology  Research  Center’s  Model  Basin  located  at  College 
Station,  Texas.  The  data  utilized  in  this  paper  correspond  to 
unidirectional  random  waves,  with  a  JONSWAP  spectrum 
[1].  Three  probes,  separated  one  meter  apart,  were  utilized 
to  record  the  waves  as  they  propagate  past  the  probes.  An 
example  of  the  wave  elevation  time  series  is  shown  in  Fig. 

1  and  its  corresponding  spectrum  is  shown  in  [2].  Of  partic¬ 
ular  interest  is  the  large-amplitude  wave  located  at  approx. 
5365  secs  in  Fig.  1,  and  the  extent  to  which  possible  nonlin¬ 
ear  interactions  between  first-  and  second-order  propagating 
wave  components  might  be  responsible  for  generating  such 
a  large-amplitude  extreme  wave. 

3.  Decomposition  Via  Volterra  Models 

To  quantify  the  relative  roles  of  first-  and  second-order 
wave  components  in  the  generation  of  large  amplitude  ex¬ 
treme  waves  we  must  first  decompose  the  observed  waves 
into  their  first-  and  second-order  components.  In  Sec.  4 
the  decomposition  's  carried  out  in  the  frequency  domain 
and  in  Sec.  5,  the  time  domain.  The  key  idea  is  to  take 
the  data  from  two  probes  and  construct  a  second-order  (i.e., 
quadratic)  Volterra  model  (see  Fig.  2),  where  the  linear  com¬ 
ponent  models  the  first-order  (i.e.,  linear)  wave  propagation 
physics  and  the  quadratic  component  models  second-order 
(i.e.,  quadratic)  propagation  phenomena.  To  the  extent  that 
we  have  a  good  model,  so  that  the  error  is  small  in  Fig.  2,  we 
may  regard  the  outputs  of  the  linear  and  quadratic  Volterra 
filters  as  a  decomposition  of  the  observed  random  waves 
into  their  first-  and  second-order  components. 

4.  Frequency-domain  Volterra  Model 

We  have  previously  reported  on  a  second-order  frequen¬ 
cy-domain  orthogonal  Volterra-like  model  of  the  first-  and 
second-  order  wave  physics  [2] .  Of  particular  interest  are  the 
plots  of  the  linear  and  quadratic  coherence  spectra  (see  Fig. 
4  in  [2]).  These  plots,  combined  with  the  autopower  spec¬ 
trum,  indicate  that  there  is  a  small  amount  of  second-order 
power  oscillating  at  the  twice  the  characteristic  frequencies 


associated  with  the  waves.  Since,  the  second-order  power 
is  approximately  two-orders  of  magnitude  smaller  than  the 
first-order  power,  one  might  conclude  that  second-order  ef¬ 
fects  are  not  important.  However,  since  determination  of 
the  higher-order  spectral  moments  (necessary  to  compute 
the  Volterra  transfer  functions)  is  based  on  the  Fourier  trans¬ 
form,  no  temporal  information  is  available  regarding  a  pos¬ 
sible  short-term  interaction  of  first-  and  second-order  wave 
components.  Since  the  large-amplitude  extreme  wave  in 
Fig.  2  is  localized  in  time,  we  must  utilize  other  approaches. 
ITius,  in  Sec.  5  we  consider  a  time  domain  Volterra  model, 
and  in  Sec.  6  a  wavelet-based  bicoherence. 

5.  Time-domain  Volterra  Model 

Using  wave  elevation  time  series  data  from  the  two 
probes  we  next  construct  a  sparse  discrete  time-domain 
second-order  Volterra  model  described  by, 

jv-i 

2/M  =  ^  hi[i]x[n  -  i]  (1) 

»=o 

^■-l  N-l 

+  h2[ij]x[n  -  i]x[n  -  j]  -f-  e[n], 

i=0  j=i 

where  hi  [i]  and  h2[i,  j]  correspond  to  the  linear  (first-order) 
and  quadratic  (second-order)  Volterra  kernels  (filter  coeffi¬ 
cients)  respectively.  The  terms  x[n]  and  y[n]  correspond  to 
the  wave  elevation  time  series  recorded  at  the  upstream  and 
downstream  probes,  respectively.  Orthogonal  search  tech¬ 
niques  are  used  to  identify  the  most  significant  linear  and 
quadratic  filter  coefficients.  Of  a  total  of  30  linear,  and  465 
quadratic  coefficients,  the  orthogonal  search  identified  the 
9  most  significant  linear  and  15  most  significant  quadratic 
coefficients.  Thus  we  have  a  sparse  model.  Next  we  input 
the  upstream  wave  elevation  into  the  sparse  Volterra  model. 
Of  particular  interest  to  us  in  this  paper  is  the  output  of  the 
linear  and  quadratic  filters.  The  relevant  time  traces  for  a 
third  probe  located  one  meter  further  downstream  are  shown 
Fig.  3  where  we  have  zoomed  in  on  the  region  contain¬ 
ing  the  large-amplitude  extreme  wave.  Note  that,  with  the 
exception  of  the  time  surrounding  the  extreme  wave,  the 
second-order  component  is  relatively  small  in  amplitude  rel¬ 
ative  to  the  first-order  component.  Furthermore,  the  linear 
component  rather  closely  models  the  observed  wave  eleva¬ 
tion.  However,  this  situation  changes  significantly  in  the 
presence  of  the  extreme  wave  in  that  the  first-order  wave 
component  underpredicts  the  wave  peak,  and  overpredicts 
the  depth  of  the  following  wave  trough.  Note,  however, 
that  the  second-order  wave  component  increases  in  ampli¬ 
tude  and  adds  to  the  first-order  component  to  correctly  pre¬ 
dict  the  amplitude  of  the  extreme  wave  (constructive  inter¬ 
ference).  Furthermore,  since  the  second-order  component 
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oscillates  at  roughly  twice  the  frequency  of  the  first-order 
component,  it  is  out  of  phase  with  the  first-order  compo¬ 
nent  at  the  following  wave  trough.  As  a  result,  the  sum¬ 
mation  of  the  out-of-phase  first-and  second-order  compo¬ 
nents  (destructive  interference)  correctly  predicts  the  rela¬ 
tively  shallow  trough  depth.  These  results  demonstrate  that 
the  generation  of  large-amplitude  extreme  waves  is  associ¬ 
ated  with  a  short-term  nonlinear  wave  interaction  character¬ 
ized  by  a  high-degree  of  phase  coupling  between  first-  and 
second-order  components.  These  experimental  results  agree 
closely  with  numerical  and  analytical  studies  of  such  phe¬ 
nomena  [3].  Furthermore,  these  results  provide  confidence 
that  the  orthogonal  search  technique  identified  those  linear 
and  quadratic  filter  coefficients  which  are  most  important  in 
terms  of  modeling  the  linear  and  nonlinear  physics. 

6.  Wavelet-based  Bicoherence 


Since  the  phase  locking  (or  phase  coupling)  associated 
with  very  large-amplitude  extreme  waves  occurs  over  a  rela¬ 
tively  short  time  period,  Fourier-based  bicoherence  analysis 
fails  to  detect  the  short-term  nonlinear  interaction.  For  this 
reason  we  employ  a  wavelet-based  bicoherence  time-scale 
“spectrum”  first  put  forward  in  [4].  We  utilize  the  continu¬ 
ous  wavelet  transform  defined  by 


Xui{a,T) 


(2) 


where  a?(^)  is  the  wave  evaluation  time  series  data,  and  a  is 
the  variable  scale.  The  quantity  ip{t)  is  a  Morlet  wavelet  de¬ 
fined  by 

(3) 

which  obviously  corresponds  to  a  complex  exponential  mul¬ 
tiplied  by  a  Gaussian. 

Next,  following  [5],  we  define  the  wavelet-based  bispec¬ 
trum  as 


5“^(ai,02)  =  J^X^(a,r)X;(ai,r)X:(a2,r)dr  (4) 

where  T  is  the  time  duration  over  which  the  wavelet-based 
bispectrum  is  computed.  The  scales  a,  ai,  and  02  satisfy  the 
scale  selection  rule 


and  scale  are  inversely  related  (i.e,  f  <x  a  ^ ).  The  wavelet- 
based  auto-bicoherence  is  given  by  analogy  to  [5,  see  also 
[4]] 

&(ai.«2))^  = 

_ _ 

(/y  |X„,(ai,r)X,„(a2,r)|2(/r)(/y  \Xn,{a,T)\'^dT) 

(6) 

Next  the  wavelet-based  bicoherence  was  computed  for 
several  wave-elevation  segments  of  150  secs  duration  in  the 
vicinity  of  the  large-amplitude  extreme  wave.  Plots  of  the 
time  series  data  associated  with  a  selected  segment,  and  the 
corresponding  wavelet-based  bispectrum  and  bicoherence 
are  shown  in  Figs.  4  and  5.  Note  that  Fig.  5  contains  the 
large-amplitude  extreme  wave  shown  in  Fig.  1.  Observe 
that  the  peaks  in  the  wavelet-based  bispectrum  and  bicoher¬ 
ence  are  fairly  broad,  due  both  to  the  random  non-narrow 
bandwidth  of  the  random  sea  waves  and  the  finite  time  dura¬ 
tion  of  the  analyzing  wavelets.  Of  particular  significance  is 
the  fact  that  the  maximum  vertical  scale  for  the  bicoherence 
is  greater  in  Fig.  5  than  in  Fig.  4.  This  suggests,  of  course, 
that  the  wavelet-based  bicoherence  is  sensitive  to  the  short¬ 
term  but  high  degree  of  phase  coupling  characteristic  of  the 
strong  nonlinear  interaction  associated  with  the  generation 
of  the  large-amplitude  extreme  wave  in  the  150  secs  segment 
starting  at  5250  secs. 

Next  we  follow  a  procedure  used  in  studies  of  plasma 
turbulence,  whereby  we  sum  the  wavelet-based  bicoherence 
over  the  upper  triangle  in  the  bifrequency  plane.  Low  fre¬ 
quencies  are  excluded  because  the  wavelet-based  bicoher¬ 
ence  contains  several  spurious  peaks  due  to  the  fact  that 
there  is  virtually  no  power  in  the  JONSWAP  spectrum  at 
these  low  frequencies.  The  summed  bicoherence  for  16 
overlapping  (50%)  segments  of  150  sec  duration  are  tabu¬ 
lated  in  Table  1.  The  values  are  normalized  to  the  largest 
value,  associated  with  the  segment  extending  from  5250 
to  5400  secs.  This,  of  course,  is  the  segment  contain¬ 
ing  the  largest  amplitude  wave.  All  other  segments  have 
a  lower  value  of  summed  bicoherence  thereby  suggesting 
that  the  wavelet-based  bicoherence  is  indeed  sensitive  to  the 
high-degree  of  phase  coupling  characteristic  of  the  short- 
time  strong  nonlinear  interactions  associated  with  extreme 
waves. 


a  ^  ^  4-  02  ^ 


(5) 


7.  Conclusion 


This  is  analogous  to  the  frequency  selection  rule  (/  =  /i  -f 
/2)  associated  with  Fourier-based  bispectral  analysis  of  sta¬ 
tionary  data,  and  with  the  frequency  selection  rule  (/  = 
A  4-/2)  associated  with  frequency  mixing  resulting  from 
quadratic  nonlinear  interactions.  Since  the  Morlet  wavelet 
has  a  well  defined  peak  in  the  frequency  domain,  frequency 


The  results  on  the  use  of  wavelet-based  bicoherence 
“spectra”  to  detect  strong,  but  short-lived,  phase  coupling 
associated  with  the  generation  of  extreme  waves  are  of  a 
preliminary  nature.  Nevertheless,  the  early  results  are  suf¬ 
ficiently  encouraging  to  warrant  further  investigation  of  this 
technique. 
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Start  time 

[sec] 

Normalized 

summed 

bicoherence 

Start  time 

[sec] 

Normalized 

summed 

bicoherence 

4500 

0.5753 

5100 

0.3264 

4575 

0.5604 

5175 

0.3656 

4650 

0.4504 

5250 

1.00 

4725 

0.4045 

5325 

0.8745 

4800 

0.3969 

5400 

0.6807 

4875 

0.4185 

5475 

0.8856 

4950 

5025 

0.4813 

0.4486 

5550 

0.5797 

Table  1 .  Table  of  the  summed  bicoherence  for 
50%  overlapping  time  intervais  of  150  sec  du¬ 
ration. 
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Figure  1.  A  portion  of  the  wave  elevation  time 
trace  containing  a  iarge-ampiitude  extreme 
wave  at  ^  5365  secs. 


Figure  2.  A  second-order  Volterra  model. 


Figure  3.  True,  that  is,  observed,  wave  el¬ 
evation  at  downstream  probe,  wave  eleva¬ 
tion  “predicted”  by  a  sparse  second-order 
Volterra  model,  and  the  Ist-order  and  2nd- 
order  wave  components  in  the  vicinity  of  the 
iarge-ampiitude  extreme  wave. 
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Figure  4.  Time-trace,  wavelet  based  bispec¬ 
trum,  and  wavelet  based  bicoherence  for  the 
time  interval  4800  to  4950  secs. 


Figure  5.  Time-trace,  wavelet  based  bispec¬ 
trum,  and  wavelet  based  bicoherence  for  the 
time  interval  5250  to  5400  secs,the  interval 
containing  the  large-amplitude  extreme  wave. 
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ABSTRACT 

High-order  ambiguity  function  (HAF)  is  an  effective  tool 
for  retrieving  coefficients  of  polynomial  phase  signals  (PPS), 
The  lag  choice  is  dictated  by  conflicting  requirements:  a 
large  lag  improves  estimation  accuracy  but  drastically  limits 
the  range  of  the  parameters  that  can  be  estimated.  By  us¬ 
ing  two  (large)  co-prime  lags  and  solving  linear  Diophantine 
equations  using  the  Euclidean  algorithm,  we  are  able  to  re¬ 
cover  the  PPS  coefficients  from  aliased  peak  positions  with¬ 
out  compromising  the  dynamic  range  and  the  estimation  ac¬ 
curacy.  Separating  components  of  a  multi-component  PPS 
whose  phase  polynomials  have  very  similar  leading  coeffi¬ 
cients  has  been  a  challenging  task,  but  can  now  be  tackled 
easily  with  the  two-lag  approach.  Numerical  examples  are 
presented  to  illustrate  the  effectiveness  of  our  method. 

1.  INTRODUCTION 

Polynomial  phase  signals  (PPS)  have  been  studied  exten¬ 
sively  in  recent  years  as  a  means  of  modeling  certain  non¬ 
stationary  processes  [2],  [4],  [9],  [10].  One  of  the  primary 
motivations  for  studying  PPS  comes  from  Doppler  radar  ap¬ 
plications.  Although  the  continuous-time  transmitted  radar 
signal  does  not  have  polynomial  phase,  the  samples  taken 
at  the  matched  filter  output  of  a  pulsed  radar  system  give 
rise  to  a  discrete-time  PPS  when  the  target  is  moving  (see 
G-g-,  [7,  pp.  58-65]).  The  polynomial  coefficients  are  then 
related  to  the  kinetic  parameters  of  the  target  [7,  p.  59]. 
PPS  modeling  of  SAR  images  has  also  appeared  in  [6],  [3]. 

A  discrete-time  single  component  PPS  of  order  M  has 
the  form 

=  (1) 

Set  =  y{t)  for  even  q  and  =  y*(t)  for  odd  5, 

where  y*(t)  denotes  the  conjugate  of  y{t).  Peleg  and  Porat 
have  shown  that  the  high- order  instantaneous  moment  (see 
[5,  Ch.  12]), 

A  /  M  -  i  \ 

^M[y(t);r]  =  Il[j/^*’^(f-9r)]''  ^  (2) 

9=0 

reduces  the  Mth-order  PPS  in  (1)  to  a  single  harmonic  at 
frequency  u  =  M\  The  generalized  FS  coefficient 


of  VM[y{t)]  r]  defined  as 

T-l 

Pm[v]oi,t\=  Yim  1  (3) 

T~>oo  i  ^ 
t-0 

is  referred  to  as  the  high-order  ambiguity  function  (HAF). 
The  magnitude  of  (3)  peaks  at  a  =  a).  Finding  this  peak 
thus  enables  us  to  estimate  the  coefficient  om  •  Once  au  is 
found,  we  demodulate  y(t)  by  exp{— to  reduce  the 
order  of  the  PPS  by  one,  and  repeat  the  above  procedure 
until  all  remaining  coefficients  of  the  phase  polynomial  are 
found. 

To  avoid  aliasing  in  estimating  a^,  one  must  limit  Q 
to  within  [— tt,  tt)  and  hence  om  to  within  [— evr,  ^tt)  with 
£  =  /M\.  Therefore  r  has  to  be  small  in  order  for 

the  dynamic  range  of  om  to  be  reasonably  large.  On  the 
other  hand,  noise  is  often  present  in  physical  systems  and 
we  observe  x{t)  =  y(t)  +  v(f),  where  v(t)  is  assumed  to 
be  zero-mean.  The  coefficient  om  is  then  estimated  using 
VM[x(t);T]  in  place  of  'PM[y{t)]r].  It  is  shown  by  Peleg 
that  in  the  presence  of  noise,  the  best  estimate  om  of  om, 
in  the  sense  of  minimizing  the  asymptotic  variance  of  Sm  ,  is 
attained  by  taking  r  =  T/M  for  M  =  2, 3,  or  r  =  T/ (M-1-2) 
for  4  <  M  <  10,  see  [5,  p.  398].  Taking  such  a  large  r  will 
drastically  limit  the  dynamic  range  of  om-  A  dilemma  is 
thus  reached  here  regarding  our  choice  of  r. 

2.  JOINT  PARAMETER  ESTIMATION  USING 
CO-PRIME  LAGS 

The  above  dilemma  is  easily  resolved  by  using  two  large  lags 
n  and  T2  that  are  co-prime.  Using  our  approach,  the  iden- 
tifiahility  condition  for  om  (i-o.,  the  range  of  om  that  can  be 
recovered  uniquely)  is  [— tt/M!,  tt/M!).  This  offers  a  much 
larger  dynamic  range  than  um  €  [“e7r,e7r),  e  = 
when  only  one  lag  is  used.  Using  two  large  lags,  the  peaks 
are  likely  to  be  at  aliased  positions,  but  the  correct  om  can 
nonetheless  be  recovered.  Both  the  theory  and  simulations 
confirm  that  while  a  much  relaxed  identification  condition 
is  maintained,  our  estimation  accuracy  is  at  least  as  good 
as  that  with  one  large  lag.  We  wish  to  point  out  here  that 
multiple  lags  are  also  used  in  [1]  to  improve  accuracy  and 
resolve  identifiability  problems  with  multicomponent  PPS. 

We  now  describe  our  method.  Let  y(t)  be  a  PPS  of  order 
M.  Observe  that  for  any  given  lag  r,  the  peak  location 
of  PM[y]CkjT]  is  at  M\aMT^~^ ,  in  theory.  But  due  to  the 
inherent  wrap-around  effect  of  the  DFT,  we  actually  observe 
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the  peak  location  at  6  6  [— tt,  tt),  with  =  b+27rk 

for  some  integer  k.  Suppose  that  two  co-prime  lags  n,  T2  are 
now  usedj  and  from  which  we  obtain  two  (possibly  aliased) 
peak  locations  6i,  62  ^  tt)  from  the  HAFs.  Hence  there 
exist  kij  k2  ^  li  such  that 

Ml  ^  —  27rA;i  +61,  (4) 

Ml  aMT^  ^  “  27rA:2+^2-  (5) 

Multiplying  (4)  by  and  (5)  by  and  then  sub¬ 

tracting  the  resulting  equations,  we  obtain 

»  M-l  r  ^M-1  ^ 

- =  N.  (6) 

Note  that  N  in  the  above  equation  is  an  integer,  in  theory, 
but  may  not  be  so  in  the  presence  of  noise,  and  we  round  it 
off  to  the  nearest  integer  in  that  case.  Next,  we  show  how 
the  condition  MlaM  €  [-tt,  tt)  and  the  co-primeness  of  n , 
T2  allow  us  to  uniquely  solve  for  ki  and  ^2- 

First  we  find  two  integers  ni,  712  such  that  n2T^  ^  — 
—  1  using  the  Euclidean  algorithm  which  is  a  stan¬ 
dard  algorithm  in  number  theory,  see  e.g.  [8]  or  the  Ap¬ 
pendix.  It  is  well  known  that  the  solutions  fci,  k2  of  the 
linear  Diophantine  equation  (6)  are  completely  character¬ 
ized  by 

ki=niN  +  lT^-\  k2  =  n2N-lT^-\  (7) 

where  I  €  Z.  Since  ki  increases  by  every  time  I 

increases  by  1,  there  exists  a  unique  ki  such  that  27rfci  +bi  6 
[-7rrj^“\7rri^“^).  Denote  this  ki  by  k^.  It  follows  from 
the  identifiability  condition  MlaM  €  [— 7r,7r)  and  (4)  that 

=  ^nki  +6i,  and  om  =  (8) 

We  now  summarize  the  two-lag  algorithm  for  estimating 
aM  below: 

Step  1.  Choose  co-prime  integers  n  and  r2  that  are  close 
to  the  optimal  lag:  T/M  for  M  =  2,  3,  or  T/{M  -h  2) 
for  4  <  M  <  10.  Assume  that  n  >  r2.  Find  the 
(possibly  aliased)  peak  locations  6i,  62  6  [— tt, tt)  from 
\PM[x]a,Ti]\  and  |Pm[x; a, r2]|,  respectively. 

Step  2.  Find  two  integers  ni,  712  satisfying  n2r^~^  - 
=  1  using  the  Euclidean  algorithm. 

Step  3.  Solve  for  the  unique  A:i  =  n\N  with 

I  e  Z  such  that  27rfci  €  Trrf^"^),  where  N  is 

defined  in  (6).  Let 

27rfci  -h  61 

Finally,  we  remeirk  that  there  are  easy  ways  to  find  n 
and  r2  that  are  co-prime.  For  example,  one  may  take  any 
two  consecutive  integers,  or  take  one  of  them  a  power  of  2 
while  the  other  an  odd  integer. 


3.  APPLICATION  TO  MULTI- COMPONENT 

PPS 

The  HAF  method  is  equally  effective  in  estimating  param¬ 
eters  of  a  multi-component  PPS.  Although  cross-terms  are 
created  in  VM[y{t);T]  when  y{t)  has  multiple  components, 
it  is  shown  in  [9]  that  these  cross-terms  do  not  generally  give 
rise  to  peaks  in  the  corresponding  HAF  Pm[7/;  a,  r].  There¬ 
fore  the  HAF  method  works  for  multi-component  PPS.  Our 
two-lag  approach  thus  offers  the  same  advantage  in  this  set¬ 
ting. 

In  the  past,  HAF  has  been  problematic  for  multi- 
component  PPS  whose  two  or  more  leading  phase  coeffi¬ 
cients  are  very  close.  As  an  added  advantage,  the  two-lag 
approach  solves  this  problem  with  ease.  To  illustrate  that, 
let  y{t)  be  an  M-th  order  two-component  PPS  and  assume 
that  the  two  leading  coefficients  aM,ij  um,2  of  the  phase 
polynomials  are  very  close.  When  r  is  small,  the  peak  lo¬ 
cations  M!aM,iT^“^  and  MlaM,2r^^^  of  PM[y;Q:,r]  are 
indistinguishable  and  one  may  be  misled  to  believe  that 
there  is  only  one  component  present.  However,  by  choosing 
a  large  r,  we  magnify  the  difference  between  aM,i  and  aM,2 
and  thus  separating  the  two  peaks.  The  two-lag  method 
then  recovers  aM,i  and  aM,2. 

We  wish  to  point  out  that  there  is  a  minor  complication  in 
the  actual  implementation  of  the  algorithm.  Suppose  that 
ri,r2  are  co- prime  and  for  each  n  we  find  two  (possibly 
aliased)  peak  locations  61, z,  62,1  of  the  HAF  PM[y‘,Oi,Ti]. 
It  is  not  immediately  clear  which  peaks  from  n  corre¬ 
spond  to  those  from  r2.  There  are  two  possible  pairings: 
{(61,1,61,2),  (62,1,62,2)101  {(61,1,62,2),  (62,1,61,2)}-  But  no¬ 
tice  that  according  to  (6),  [6i,ir^“^  —  6j,2Tj^”^]/(27r)  must 
be  an  integer  if  (6i,i,  6^,2)  is  to  be  a  correct  pair.  We  choose 
the  two  pairs  so  that  the  above  expression  is  the  closest  to 
being  integers.  Pairing  ambiguity  is  thus  resolved.  Once 
the  leading  coefficients  are  found,  we  follow  the  procedure 
delineated  in  [9]  to  find  all  remaining  parameters. 


4.  SIMULATIONS 

We  present  here  two  numerical  examples  to  illustrate  the 
two-lag  algorithm. 

Example  1.  We  have  T  =  512  samples  of  a  chirp  x(t)  = 
exp{ja2t^)  +  v{t),  where  a2  =  0.1  and  v{t)  is  the  zero-mean 
i.i.d.  complex  Gaussian  noise  with  variance  0.5.  For  a  given 
lag  r,  we  form  P2[x;  a,  r]  by  taking  the  normalized  (by  data 
length  T)  DFT  of  the  product  process  x*  {t—r)x{t).  Method 
I  uses  a  single  lag  r  =  1,  and  the  corresponding  2a2T  =  0.2 
is  free  of  the  aliasing  effect.  The  bias  and  standard  deviation 
of  the  estimate  a 2  calculated  from  100  independent  realiza¬ 
tions  were  4.5219  x  10”®  and  5.7433  x  10“®,  respectively. 
Method  H  uses  two  lags  ri  =  T/2  =  256  and  T2  =  201,  and 
aliasing  occurs  for  both  lags.  We  recover  02  using  the  previ¬ 
ously  outlined  two-lag  algorithm,  by  solving  the  linear  Dio¬ 
phantine  equation  (6).  The  bias  and  the  standard  deviation 
of  our  estimate  a 2  calculated  from  the  same  100  indepen¬ 
dent  realizations  were  —1.1679  x  10“^  and  1.5475  x  10“®, 
respectively.  Using  the  two-lag  approach,  we  have  gained 
an  order  of  magnitude  improvement  in  both  the  bias  and 
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the  standard  deviation! 


Reverse  the  process  to  obtain 


Example  2.  We  have  T  =  512  samples  of 
x{t)  =  1.2  +  ei(o.40u2_,) 


where  v{t)  is  the  zero-mean  i.i.d.  complex  Gaussian  noise 
with  variance  0.5.  Figure  1  illustrates  the  magnitude  of 
P2[x]ajT]  for  different  r’s  and  Table  1  shows  the  corre¬ 
sponding  peak  locations.  A  small  lag  such  as  n  =  1  reveals 
a  single  peak,  but  a  large  lag  has  the  ability  to  split  the 
peak  into  two,  showing  that  there  are  actually  two  chirps. 
Prom  the  four  peak  locations  obtained  using  two  large  co¬ 
prime  lags  T2  and  rs,  we  recover  the  chirp  coefficients  ai,2 
and  a2,2*  For  the  realization  shown  in  Figure  1,  we  ob¬ 
tain  ai,2  =  0.40000115150160,  a2,2  =  0.40099542805416, 
which  are  very  accurate.  We  have  generated  100  indepen¬ 
dent  realizations,  and  have  found  the  bias  and  the  standard 
deviation  in  ai,2  to  be  —1.6063  x  10“®  and  1.4883  x  10”®, 
respectively,  and  the  bias  and  the  standard  deviation  in  02,2 
to  be  —2.0721  x  10”®  and  1.8729  x  10”®,  respectively. 


5.  CONCLUSIONS 

The  high-order  ambiguity  function  has  been  used  frequently 
to  estimate  parameters  of  polynomial  phase  signal  (PPS). 
It  is  well  known  that  large  lags  offer  better  estimation  ac¬ 
curacy  but  limit  the  range  of  parameters  that  can  be  iden¬ 
tified.  We  demonstrate  that  by  using  two  (large)  co- prime 
lags  and  solving  linear  Diophantine  equations  by  way  of 
the  Euclidean  algorithm,  we  can  recover  the  true  PPS  co¬ 
efficients  from  aliased  positions.  Good  estimation  accuracy 
and  dynamic  range  of  the  PPS  coefficients  are  simultane¬ 
ously  achieved  with  this  method.  The  two-lag  method  for 
multi-component  PPS  parameter  estimation  offers  the  same 
advantage  in  accuracy  and  dynamic  range,  even  when  the 
leading  coefficients  of  the  phase  polynomials  are  very  close, 
because  we  are  free  to  use  lag  r’s  to  magnify  their  difference. 


6.  APPENDIX:  LINEAR  DIOPHANTINE 
EQUATIONS  AND  THE  EUCLIDEAN 
ALGORITHM 


Let  a  >  6  >  0  be  co-prime  integers  and  N  e  Z.  The  linear 
Diophantine  equation 


ax-\-by^  1,  (9) 

can  be  solved  easily  using  the  Euclidean  algorithm,  which 
finds  a  pair  of  integers  xo,  yo  such  that  axo  byo  =  1. 
Details  on  linear  Diophantine  equations  and  the  Euclidean 
algorithm  can  be  found  in  textbooks  on  elementary  number 
theory,  see  e.g.  [8].  We  illustrate  the  Euclidean  algorithm 
here  by  way  of  an  example  with  a  =  256,  b  =  191. 

First,  we  carry  out  the  following  successive  quotients  until 
the  remainder  is  1: 


256  =  lx  191  +  65, 

191  =  2x65-h61, 

65  =  1  X  61  -h  4, 

61  =  15x4  4-1. 


1  =  61  -  15  X  4 

=  61  -  15  X  (65  -  1  X  61) 

=  16  X  61  -  15  X  65 

=  16  X  (191  -  2  X  65)  -  15  X  65 

=  16  X  191  -  47  X  65 

=  16  X  191  -  47  X  (256  -lx  191) 

=  -47x  256  +  63  x  191. 

Therefore  we  have  found  xo  =  —47  and  yo  =  63  such  that 
axo  +  5yo  =  1.  Once  and  yo  are  found,  the  solutions  to 
the  linear  Diophantine  equation  (9)  are  given  by 

X  =  Nxo  +  6/,  y  =  iVyo  -  a/,  V  /  6  Z.  (10) 

The  Euclidean  algorithm  converges  very  fast.  In  fact,  it 
can  be  shown  that  the  number  of  steps  is  at  most  6.65  times 
the  number  of  digits  in  b  [8,  p.  29]. 
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Figure  1.  The  ambiguity  function  (HAF  of  order  2)  of  a  two-component  chirp  for  various  r  s. 
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Abstract 

A  method  is  proposed  for  the  restoration  of  linearly 
blurred  two-tone  images  without  requiring  the  knowl¬ 
edge  of  the  blur  parameters.  The  method  jointly  esti¬ 
mates  the  original  image  and  the  blur  parameters  based 
on  some  statistical  parameters  at  the  output  of  an  in¬ 
verse  filter.  Unlike  some  other  blind  image  restoration 
procedures,  the  proposed  method  does  not  require  the 
estimation  or  modeling  of  the  statistical  properties  of 
the  original  image,  yet  can  be  justified  even  for  non- 
Li,  d,  images, 

1.  Introduction 

In  typical  automatic  recognition  and  identification 
systems,  the  input  images  are  usually  assumed  to  be 
blur-free.  Sometimes,  blurred  images  can  be  encoun¬ 
tered  due  to  the  impairment  of  imaging  systems  and 
environment.  If  the  blur  is  not  properly  removed,  the 
recognizer’s  performance  can  severely  deteriorate.  For 
example,  if  the  blurred  text  image  in  Fig.  1  is  seg¬ 
mented,  without  deblurring,  for  automatic  character 
recognition,  the  result  could  become  unrecognizable 
even  to  trained  human  eyes  (see  Fig.  2). 

In  this  article  we  are  concerned  with  the  restora¬ 
tion  of  linearly  blurred  two-tone  images  such  as  texts 
and  bar  codes  [1].  We  propose  a  restoration  method 
that  requires  no  prior  knowledge  of  the  blur  param¬ 
eters  or  the  statistical  properties  of  the  original  im¬ 
ages.  The  method  is  shown  to  be  able  to  handle  not 
only  images  with  i,i.d,  (independent  and  identically 
distributed)  pixels  but  also  images  with  non-i.i.d,  pix¬ 
els  that  are  often  encountered  in  reality.  Due  to  the 
use  of  higher-order  statistics  [2] ,  the  method  is  also  ca¬ 
pable  of  handling  minimum  as  well  as  non-minimum 
phase  blur  filters.  These  features  are  very  desirable 
for  image  restoration  problems. 

This  work  is  an  extension  of  some  previous  works  on 
blind  deconvolution  (or  equalization)  of  1-D  signals  in 
the  digital  communications  context  [4],  [5].  One  of  the 
major  differences  is  that  unlike  the  digital  communica¬ 
tions  problem  where  the  alphabets  of  the  transmitted 
digital  signal  are  known  a  priori  at  the  receiver,  the 


two  tones  of  the  original  image  in  the  image  deblur¬ 
ring  problem  usually  are  not  available.  The  proposed 
method  in  this  article  is  able  to  take  advantage  of  the 
two-tone  information  without  requiring  knowledge  of 
the  specific  values  of  the  tones  -  they  are  estimated 
from  the  blurred  image  jointly  with  the  blur  filter  and 
the  original  image, 

2.  Deblurring  Method 

For  linear  degradations,  the  blurred  image,  denoted 
by  y  :=  {Y{m,n)},  can  be  regarded  as  the  output 
from  a  blur  filter 


y  =  B0A:, 

where  B  :=  {b{j,k)}  is  the  point-spread  function 
(PSF)  of  the  blur  filter,  A!  :=  {X(m,  n)}  is  the  original 
image,  and  (g>  stands  for  two-dimensional  convolution. 

To  deblur  the  image  y  when  the  blur  filter  B  is 
not  available,  we  filter  y  with  an  inverse  filter  T 
{fij^k)}  to  obtain  Z  “  {Z{m,n)},  where 


In  this  article,  we  propose  the  following  cost  function 
for  the  selection  of  T': 


min  a^H{Z)a, 
a2  =  1 

where  a  :=  [ao,  ai,a2]'^  is  a  constant  vector  and  H{Z) 
is  a  three-by-three  matrix 


H{Z)  := 


1  iii{Z)  fi2iZ)  - 

Hi{Z)  fi2{Z)  nz{Z)  , 

M2(-2)  iiz{Z)  H4{Z)  j 


0-8186-80D5-9/97  $10.00  ©  1997  IEEE 


108 


with  defined  by 

:=  E{Z\m,n)}, 

being  the  zth  statistical  moment  of  Z, 

3.  Does  the  Method  Work? 

To  see  why  minimizing  J{E)  leads  to  a  reasonable 
solution,  it  is  instrumental  to  note  that 

a^H{Z)  a  =  E{\'ip{Z\m,n))\‘^}, 

where  i^iZ)  is  a  two-degree  polynomial  defined  by 

'ipiZ)  ao  aiZ  +  ci2Z^. 

Let  /?i  and  (32  represent  the  zeros  of  '0(Z)  so  that 

^{Z)  =  a2  (Z-A)(^“/32). 

Then,  it  easy  to  see  that  if  T  equals  the  inverse  of  B 
or  any  of  its  shifted  and/or  scaled  versions,  the  output 
image  Z  would  coincide  with  A:'  up  to  a  shift  and/or  a 
scale  and  thus  become  a  two-tone  image.  With  /?i  and 
132  representing  the  two  tones  of  Z,  one  would  obtain 
'ip{Z{m,n))  =  0,  and  therefore  J{E)  would  attain  its 
minimum  value  zero. 

On  the  other  hand,  if  J{E)  =  0  for  some  E  ^  0, 
then  'ip{Z{m,  n))  =  0  for  some  7^/02,  so  that  Z  must 
be  a  two-tone  image.  Therefore,  for  the  justification  of 
the  method,  it  suffices  to  ensure  that  the  two-tone  im¬ 
age  Z  so  obtained  can  only  be  a  shifted  and/or  scaled 
version  of  the  original  two-tone  image  X.  Although 
one  should  not  expect  this  for  every  two- tone  image 
A",  it  can  be  shown  that  it  is  true  for  a  large  class  of 
two-tone  images  [3]. 

For  example,  if  the  pixels  of  X  are  then  Z 

must  be  a  shifted  and/or  scaled  version  of  X  as  soon  as 
it  becomes  a  two- tone  image  as  a  result  of  the  filtering. 
To  see  this,  we  first  note  that 

Z  =  T  ^  Xj 

with  T  :=  {t{j,  k)}  :=  EZ)B  being  the  combination  of 
the  deblurring  filter  with  the  blur  filter.  We  proceed 
with  the  justification  by  the  method  of  contradiction. 
Suppose  that  2'  is  a  two-tone  image  but  T  is  not  a 
shifted  and/or  scaled  delta  function;  for  simplicity,  let 
us  assume  that  to  :=  ^(0,0)  ^  0  and  ti  :=  i(l,  1)  ^  0. 
Furthermore,  without  loss  of  generality,  let  us  assume 
that  X  takes  values  in  {0, 1}.  Then,  at  any  location 
(m,  n),  we  can  write 

Z{m^n)  ~  toX{m^n) 

-h  txX{m  —  l,n  -  1)  -h  i?(m,n). 


where  R{m,n)  :=  Yl'  t{j^k)  X{m  —  j^n  —  k);  the  sum 
is  computed  for  {j,k)  ^  (0,0),  (1,1).  (For  sim¬ 
plicity,  let’s  assume  that  T  has  a  finite  support.)  Be¬ 
cause  the  pixels  of  X  are  independent,  it  is  always 
possible  to  find  the  locations  {rrii^ni),  {i  =  1,2, 3, 4), 
such  that  R{mi,ni)  —  const,  and  {X{mi,ni),X{mi  ~ 
1,  nj  -  1)}  {(0, 0),  (0, 1),  (1, 0),  (1, 1)}.  This  implies 

that  the  image  Z  would  at  least  take  all  the  values  in 
{r,  ^0  +  r,  to-\-ti+  r]  and  thus  contradicts  with 

the  assumption  that  Z  is  a  two-tone  image. 

The  above  argument  can  be  generalized  to  non-i.z.d. 
images,  with  the  relaxed  requirement  that  any  finite 
collection  of  the  two  tone  values  be  visited  by  the  pix¬ 
els  of  X  with  a  positive  probability  [3].  This  require¬ 
ment  is  analogous  to  the  ergodicity  requirements  in 
the  classic  statistical  theory  for  consistent  parameter 
estimation.  For  two-tone  images  that  satisfy  this  re¬ 
quirement,  it  can  be  shown  that  Z  must  be  a  shifted 
and/or  scaled  version  of  X  once  J{E)  is  minimized 
with  some  filter  .F  /  0. 

4.  How  to  Compute  the  Minimizer? 

The  minimization  of  J{E)  is  a  nonlinear  problem 
and  therefore  has  to  rely  upon  some  iterative  proce¬ 
dures.  In  the  following,  we  outline  a  three-step  itera¬ 
tive  algorithm  (TSIA)  for  the  minimization  of  J{E)^ 
More  sophisticated  variations  of  TSIA  are  possible,  es¬ 
pecially  for  Step  3,  but  we  do  not  address  them  in  this 
article. 

1.  With  Zk  :=  Ek  0  3^,  compute  ak  such  that 

ak  :=  ^xgmm{a^ H{Zk)  a  :  a2  ~  1}. 

2.  Compute  Pi^k  and  p2,k  as  the  zeros  of  the  two- 

degree  polynomial  'ij^ki^)  •—  3-  ai^k^  3-  Z‘^ , 

where  ak  :=  [ao,k,ai,kA]'^ • 

3.  Update  the  filter  by  a  steepest  descent  method, 
with  step  size  /x  >  0,  such  that 

Ek+i  :=  Ek  -  /J-VJk{Ek), 
where  Jk{E)  :=  alH{Z)  ak  and  Z  =  E  ®y. 

It  is  clear  that  TSIA  jointly  estimates  the  inverse 
blur  filter,  the  original  image,  and  the  tones  by  Ek-, 
Zk,  and  (/3i,fc,/32,fe)?  respectively.  Note  that  in  order 
to  avoid  the  trivial  solution  of  .F  =  0  it  may  be  neces¬ 
sary  to  impose  a  constraint,  such  that  Yfih^)  ~ 
on  the  filter.  This  constraint  does  not  affect  the  viabil¬ 
ity  of  the  method  because  the  tones  are  automatically 
adjusted  in  TSIA  according  to  the  scale  of  Z, 

The  computation  of  ak  in  Step  1  is  quite  straightfor¬ 
ward.  In  fact,  since  a'^H{Zk)  a  is  a  quadratic  function 
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of  ao  and  ai  under  the  constraint  a2  =  1,  it  is  easy  to 
show  that  ak  can  be  uniquely  determined  by 

ClO,k  =  {Cl,kCs,k  -  <^2,k)/i^2,k  - 

and 

dl.k  =  {Cl,kC2,k  -  C3^k)l{C2,k  “ 

where  Ci^k  E{Zl{m,n)}  is  the  zth  central  moment 
1,2,3). 

As  a  simple  variation  of  TSIA,  one  may  update  the 
tones  after  the  filter  is  iterated  M  >  1  times.  In  other 
words,  one  may  carry  out  Steps  1  and  2  once  every 
M  iterations  of  Step  3  with  fixed  tones.  Experience 
shows  that  this  variation  reduces  the  computational 
complexity  but  still  achieves  reasonable  results  as  long 
as  M  is  not  too  large. 

5.  How  Does  It  Work  On  Data? 

To  test  the  proposed  method,  let  us  consider  the 
blurred  text  image  shown  in  Fig.  1.  The  blur  is  caused 
by  an  autoregressive  filter  of  the  form 

y(m,n)  =  piy(m  -  l,n) +p2T(m,n  -  1) 

-  PiP2T(m  -  l,n  -  1)  +  X(m,n), 

where  pi  and  p2  are  the  filter  parameters.  In  Fig.  1 
we  use  pi  ~  p2  —  0.7. 

For  automatic  character  recognition,  the  raw  images 
are  typically  required  to  be  segmented  into  binary  ones 
before  they  can  be  fed  into  a  trained  recognizer,  such 
as  an  artificial  neural  network.  A  simple  segmenta¬ 
tion  method  is  to  classify  each  pixel  as  being  black  or 
white  by  comparing  the  pixel  with  a  predetermined 
threshold.  If  the  blurred  image  in  Fig.  1  is  segmented 
without  deblurring,  one  would  obtain  a  binary  image 
that  could  be  difficult  for  the  recognizer  to  handle. 

Fig.  2(a)  shows  a  segmentation  result  for  the  blurred 
image  in  Fig.  1.  As  we  can  see,  the  direct  segmentation 
of  the  blurred  image  produces  a  result  that  is  unrec¬ 
ognizable  perhaps  even  by  trained  human  eyes.  The 
difficulty  is  also  reflected  objectively  by  the  propor¬ 
tion  of  misclassified  pixels,  which  equals  16.19%  for 
the  33-by-256  image  in  Fig.  2(a). 

To  improve  the  segmentation  results,  we  apply  the 
proposed  deblurring  method  to  the  blurred  image  in 
Fig.  1.  The  filter  T  is  restricted  to  be  of  size  three-by- 
three,  so  that  /(j,  k)  \j\  >  2  or  |A:|  >  2.  The  non¬ 
zero  coefficients  of  the  initial  filter  !Fq  {/o(j?^)} 
are  /o(0,0)  =  1.0,  /o(l,0)  =  /(0,1)  =  -0.2,  and 
/o(l,l)  =  0.04.  Fig.  2(b)  shows  the  segmentation  re¬ 
sult  based  on  the  output  from  the  initial  filter  -  the 
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Figure  2.  Segmentation  results  from:  (a)  the  blurred 
Image  in  Fig.  1;  (b)  the  output  of  the  initial  filter; 
and  (c)  the  deblurred  image  after  4,000  iterations  of 
TSIA.  The  original  text  Image  is  shown  in  (d). 


image  is  still  difficult  to  recognize,  with  the  proportion 
of  misclassified  pixels  equal  to  10.58%. 

Initiated  with  To,  we  update  the  deblurring  filter 
using  the  TSIA  algorithm  -  the  step  size  parameter  is 
taken  to  be  =  2  x  10“^°  and  the  tones  are  estimated 
once  every  M  =  30  iterations  of  Step  3.  Fig.  2(c) 
shows  the  segmentation  result  based  on  a  deblurred 
image  after  4,000  iterations.  Compared  with  Figs. 
2(a)-(b),  and  with  the  original  two-tone  (0-255)  im¬ 
age  in  Fig.  1(d),  the  improvement  on  recognizability 
is  quite  significant,  not  only  visually  but  also  in  terms 
of  the  proportion  of  misclassified  pixels,  which  is  now 
reduced  to  0.57%. 

The  convergence  behavior  of  TSIA  is  shown  in  Fig. 
3  in  terms  of  the  cost  function  JiT).  As  we  can  see, 
the  cost  function  decreases  monotonically  as  the  itera¬ 
tion  proceeds.  The  convergence  rate  is  high  at  the  be¬ 
ginning,  but  the  algorithm  slows  down  after  a  certain 
number  of  iterations  (roughly  2,000  in  this  example). 
This  behavior  is  typical  for  the  steepest-descent-type 
algorithms  [6]. 

To  obtain  the  segmented  images  in  Fig.  2,  the  seg¬ 
mentation  threshold  is  determined  so  that  the  propor¬ 
tions  of  black  and  white  pixels  after  the  segmentation 
equal  those  in  the  original  image.  Fig.  4  and  Table  1 
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Figure  3.  Plot  of  the  cost  function  J{T)  (in  loga¬ 
rithmic  scale)  against  the  number  of  iterations.  The 
TSIA  parameters  are:  fj,  =  2  x  10“^®  and  M  =  30. 


Table  1.  Percentage  of  mis-classified  pixels 


Threshold 

Blurred 

Initial 

Deblurred 

63 

20.33 

19.53 

0.86 

76 

18.34 

17.34 

0.47 

89 

16.60 

14.73 

0.24 

102 

15.42 

12.56 

0.16 

114 

15.11 

10.90 

0.33 

127 

15.29 

10.36 

0.54 

140 

16.50 

11.50 

1.06 

demonstrate  the  impact  of  the  threshold  on  the  seg¬ 
mentation  results  in  terms  of  the  classification  errors. 
When  the  images  are  blurred,  the  segmentation  error 
depends  crucially  on  the  threshold,  as  shown  by  the 
dashed  and  dotted  lines  in  Fig.  4.  The  threshold  is 
much  less  critical  when  the  blur  is  properly  removed, 
as  shown  by  the  solid  line  in  Fig.  4.  In  this  example, 
the  deblurred  image  produces  dramatically  improved 
segmentation  results  for  all  the  thresholds. 

6,  Conclusions 

In  this  article  we  have  proposed  a  method  for  the 
restoration  of  blurred  two-tone  images  when  the  blur 
filter  is  unknown.  The  method  jointly  estimates  the 
blur  filter  along  with  the  original  image  and  the  tone 
values  based  solely  on  the  blurred  image. 

The  method  shows  that  it  is  possible  to  blindly  de¬ 
blur  certain  two-tone  images  of  correlated  pixels  with¬ 
out  explicitly  estimating  or  modeling  their  statistical 
properties.  The  text  image  example  demonstrates  the 
potential  usefulness  of  the  method  for  front-end  pro¬ 


Figure  4.  Proportion  of  mis-classified  pixels  as  a 
function  of  the  segmentation  threshold  ^  dashed  line 
for  the  blurred  image,  dotted  line  for  the  image  from 
the  initial  filter,  and  solid  line  for  the  deblurred  image 
after  4,000  iterations. 

cessing  in  automatic  character  recognition  systems. 

For  the  future  research,  one  can  certainly  extend 
the  results  in  this  article  to  multi-tone  images  and  ex¬ 
plore  other  possible  cost  functions  and  algorithms  for 
the  implementation.  The  bottom  line  is  that  if  one 
judiciously  takes  advantage  of  the  prior  two-tone  (or 
multi-tone)  information,  simple  yet  powerful  deblur¬ 
ring  algorithms  can  be  obtained. 
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Abstract 

This  paper  addresses  the  problem  of  (almost)  pe¬ 
riodic  moving  average  (APMA)  system  identification. 
Two  normal  equations  are  established  by  using  time- 
varying  higher  order  cumulants  of  the  measurements, 
from  which  two  new  linear  algebraic  algorithms  are  pre¬ 
sented  for  parameter  estimation.  Simulation  examples 
are  given  to  demonstrate  the  performance  of  these  new 
approaches. 


i.  Introduction 

During  the  last  decade,  the  problem  of  identifying 
linear  time-invariant  moving  average  (LTI-MA)  sys¬ 
tems  via  higher  order  statistics  (HOS)  has  been  well 
studied;  see  [1]  for  an  excellent  overview.  However, 
much  less  attention  has  been  paid  to  identifying  time- 
varying  systems. 

This  paper  addresses  the  problem  of  (almost)  periodic 
moving  average  (APMA)  system  identification.  APMA 
system  is  an  important  class  of  time-varying  systems 
which  are  often  encountered  in  hydrology,  meteorolog¬ 
ical  data  and  telecommunication  systems.  Recently, 
Dandawate  and  Giannakis  [2]  introduced  a  novel  ap¬ 
proach  for  modeling  APMA  processes  using  higher  or¬ 
der  cyclic-statistics  (HOGS).  This  approach  is  based 
on  two  main  steps:  (i)  by  exploiting  the  cyclostation- 
arity  property  of  APMA  processes,  the  time-varying 
higher  order  cumulants  (TV-HOC’s)  are  estimated  via 
the  HOCS’s  computed  from  one  data  record;  and  then 
(ii)  the  model  parameters  are  estimated  from  the  TV- 
HOC’s  by  using  linear  or  nonlinear  programming  algo- 
aithms.  This  method  is  conceptually  simple,  however, 
as  pointed  out  in  [2],  the  bias  and  variance  of  these  pa¬ 
rameter  estimators  are  considerably  higher  especially 


for  the  linear  algorithms.  This  is  expected  since  the  lin¬ 
ear  algorithms  in  [2]  are  time- varying  extensions  of  the 
so-called  closed-form  solutions  designed  for  LTI-MA 
system  identification,  which  would  exhibt  poor  numer¬ 
ical  performance  without  smoothing  out  the  effects  of 
the  estimation  errors  of  cumulants  [1].  Therefore,  new 
linear  algorithms  with  better  numerical  performance 
are  highly  desirable. 

As  a  matter  of  fact,  linear  algebraic  methods  are  ap¬ 
propriate  choices  for  improving  the  performance  of  pa¬ 
rameter  estimates  in  LTI-MA  system  identification  [1]. 
It  is  reasonable  to  expect  that  linear  algebraic  methods 
outperform  the  closed-form  methods  even  for  the  time- 
varying  case.  The  objective  of  this  paper  is  to  present 
a  linear  algebraic  approach  for  APMA  system  iden¬ 
tification.  Two  new  normal  equations  based  on  TV- 
HOC’s  are  established,  from  which  the  time-varying 
system  parameters  are  uniquely  recovered  (within  some 
constant  scale  factor).  Simulation  examples  are  pre¬ 
sented  to  demonstrate  the  performance  of  this  paper’s 
approaches. 

2.  Problem  Statements 

Throughout  this  paper,  it  is  assumed  that  the  chan¬ 
nel  is  an  APMA  system  with  order-g^,  and  that  the 
measurement  output  z{n)  is  described  by 

Q 

z{n)  =  x{n)  +  w{n)  =  ^  6(n;  fe)e(n  -  A:)  -1-  w{n)  (1) 
k=o 

where  e{n)  and  x{n)  are  the  driving  noise  and  the 
“real”  system  output,  respectively;  and  w{n)  is  the 
measurement  noise.  The  impulse  response  coefficients, 
b{n;kys,  vary  almost  periodically  (AP)  with  respect  to 
n  at  each  lag  k,  thus  x{n)  is  a  time- varying,  but  cyclo¬ 
stationary  process  [2]. 
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Let  mix{n\T)  and  cix{n\T)  denote  the  /th-order  time- 

varying  moment  and  cumulant  of  x{n)  at  lag  r  = 
(ri,  •  •  • ,  r/_i),  respectively.  The  rigorous  relation  be¬ 
tween  cix{n\T)  and  rnix{n;T)  can  be  found  in  [6]. 
Specifically,  for  (time  average)  zero-mean  processes 
with  /  =  2, 3  and  4,  we  have 

C2x{n]T)  =m2a:in;T),  (2) 

C3x  (n;  n ,  T2)  =  msa;  (n;  n ,  T2),  (3) 

and 

C4a;(^} 

=  m4x(n-,Ti,T2,T3)  -  m2x{n-,Ti)m2x{n  +  T2,r3  -T2) 

-m2x{n;  T2)m2x{n  +  T3,ti-  T3) 

-m2x(n-,T3)m2x{n  +  Ti,T2~Ti).  (4) 


3.  Tvi^o  Linear  Algebraic  Algorithms 

It  is  well  known  that  under  condition  AS2) 

/  >  3,  (7) 

because  of  the  higher  than  second-order  cumulants  of 
Gaussian  processes  being  identical  to  zero,  and  that  the 
Zth-order  cumulants  are  related  to  the  impulse  response 
coefficients  by  [2] 

? 

=  'yie^b{n]k)b{n  +  Ti;k  +  Ti) 
k=0 

■■■b{n  +  Ti-i,k  +  Ti-i).  (8) 

By  using  the  fact  that  b{n;  k)  =  0,  for  A:  <  0  and  k>  q, 
it  follows  that 


Since  b{n;  k)  is  AP  in  n,  7nix{n;T)  is  also  AP  in  n  for 
each  r.  Hence,  it  possesses  the  following  Fourier  series 
representation 


m,.(n;r)=  5]  (5) 


where  r)  is  called  the  /th-order  cyclic-moment 

at  cycle  a,  and 

('t)  =  y  (6) 

n=0 

and  the  set  of  cycles  at  (6)  is  defined  as  Aix  =  {a  : 

7^  0,0  <  a  <  27r}.  By  [2],  when  x{n)  is  a 
mixing  process,  a  consistent  and  asymptotically  nor¬ 
mal  estimate  of  can  be  obtained  via  sample 

average  technique  based  on  one  data  record,  and  the 
/th-order  time-varying  moment,  thus  cumulant  can  be 
computed  via  (5),  and  (2)-(4). 

In  this  paper,  we  concentrate  our  attention  on  identify¬ 
ing  the  time- varying  impulse  response  coefficients  from 
the  TV-HOC’s  of  the  output  measurements.  Therefore, 
we  assume  that  the  cycles  a  e  Aix  and  the  order  q  are 
known  a  priori.  Otherwise,  they  can  be  estimated  first 
by  performing  the  statistical  tests  for  presence  of  cy- 
clostationarity  in  the  measurements [2,  5].  In  addition, 
the  following  conditions  are  assumed  to  hold. 

ASl)  The  driving  noise  sequence  e{n)  is  unobserv¬ 
able,  and  is  zero-mean,  independent  and  identically  dis¬ 
tributed  (i.i.d.),  and  non-Gaussian  process  with  some 
/  such  that  0  <  |7;^|  <  oo,/  >  2,  where  'yie  is  the  Ith 
order  cumulant  of  €{n). 

AS2)  The  measurement  noise  w{n)  is  a  Gaussian  (col¬ 
ored)  process  independent  of  e{n)  ,  and  hence  of  a;(n). 


ciz{n]Ti,T2,’-  k) 

=  liEh{n\  0)6(n  +  n;  ri)  •  •  *  6(n  +  g) 

x6(n  +  ^;fc).  (9) 

Based  on  (9),  three  linear  algorithms  were  developed 
by  Dandawate  and  Giannakis  [2]  for  estimating  the 
unknown  parameters,  6(n;fc)’s,  within  some  constant 
scale  factor.  Here  we  refer  to  them  as  DG-1,  DG-2  and 
DG-3,  respectively. 

(1)  DG-1:  The  impulse  response  coefficients  are  given 
by 

=  Q,(n-fc;ri,---,r,_3,g,A:) 

lie  CQ-I}z{n- k;Ti,---,Ti-3,q) 

(2)  DG-2:  The  impulse  response  coefficients  are  given 
by 


Un-  k)zx-l _ C3^{n-k-,q,k) _ 

73^  <l)czz{n  -  t,  q,  0)]V3 ' 

(11) 

(3)  DG-3:  The  impulse  response  coefficients  are  given 
by 


b{n  -h  k]  k) 


-i/(  r cfz(^;o.---,o,o,g) 

\cix{n-,Q,---,Q,q,q) 
^,  Ciz{n\Ti,  -  •  ■  ,Ti-3,q,k) 
ciz{n-,Ti,  -  ■  ■  ,Ti-3,q,Qi)' 


i/i 

(12) 


Although  the  above  algorithms  are  based  upon  simple 
computations,  they  belong  to  the  closed-form  solutions 
which  do  not  smooth  out  the  effects  of  the  estimation 
errors  of  the  TV-HOC’s.  Now,  let  us  establish  two  nor¬ 
mal  equations  based  on  the  TV-HOC’s  of  the  output 
measurements,  and  develop  two  linear  algebraic  algo¬ 
rithms  for  estimating  the  APMA  system  parameters. 
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3.1.  Normal  Equation  based  on  the  Diagonal  Slice 
of  TV-HOC’S 

Let  ri  =  •  •  •  =  Ti-i  =  r  in  (8),  we  have 

9 

ciz{n-,  b{n;  k)[b{n  +  r;  fc  +  r)]*  ^ . 

fe=o  ,  , 

(13) 

On  the  other  hand,  from  (9),  we  have 

cu{n-k;0,---,0,q,k  +  T) 

=  lie  [b{n  -  k\  -k-\rq]q) 

xb{n  —  k  +  T-,k  +  t)  (14) 


ciz{n  -  k;0,  -  ■  ■  ,0,q,0) 

=  iie[b{n  -  k;0)f~^b{n  —  k  +  q;q),  (15) 


which  yield 


b{n  +  T-,k  +  t) 

=  b{n-k-,0) - 7 - r-r - PTTTo 

ciz{n-k-,0,---,0,q,0) 


5(n-.;0)  =  (  ™ 

\lieCiz{n-k-,0,---,0,q,q) ) 

where  we  assume  b{n  —  fc;  0)  >0  when  even  I  is  used. 
Substitute  (16)  and  (17)  into  (13),  and  denote 


ai(n  +  T-,k  +  T) 


rc?.(n-fc;0,---,0,g,0)l 

~  \ciz{n  -  k-,0,--- ,0,q,q)  } 


«-!)/« 


Cf^(n-fc;0,---,0,g,A:  +  r) 

ciz{n-k-,0,---,0,q,0) 


Considering  that  ciz{n,T,  -  •  ■  ,t)  =  0,  for  |r|  >  q,  and 
that 

a;  (n  +  T\q)^  0,  for  any  r  (19) 

aAn  +  T',k  +  t)  =  Q,  for  fc  +  t  <  0,  and  k  +  r  >  q 

(20) 

due  to  (9)  and  (18),  we  have 


ciz{n-,T,---,r)  =  n\k)ai{n  +  T\k  +  T), 

fc=0 

—  q  <  T  <  q.  (21) 

Equation  (21)  is  referred  to  as  the  normal  equation 
based  on  the  diagonal  slice  of  TV-HOC’s  for  time- 
varying  MA(g)  system  identification. 


Concatenating  (21)  for  r  =  -g,  •  •  • ,  0,  •  •  • ,  g,  we  obtain 
the  following  matrix  equation 

Ax  —  b  (22) 

where  A  contains  the  submatrix  Ai  as 

'  ai{n;q)  o/(n;g-l)  •••  a((n;0) 

ai{n  +  l;q)  a/(Ti-l-l;l) 

Ai=  ..  : 

0  ai{n  +  q-,q) 

(23) 

and 

X  =  7iV*[5(n;  q),  b{n]  g  -  1),  •  •  • ,  6(n;  0)]^.  (24) 

Prom  (23)  and  (19),  it  is  easy  to  see  that  Ax  has  full 
rank  g  -I-  1  due  to  det(Ai)  =  +  *5  9)  ^  Oj 

thus  the  coefficient  matrix  in  (22)  has  full  rank  g  -t- 1. 
This  shows  that  the  matrix  equation  (22)  with  (g  -f  1) 
unknown  parameters  has  a  unique  least-square  (LS) 
solution  given  by 

X  =  (A  A)~'^A  b.  (25) 

Solving  (25),  we  obtain  the  estimates  of  7,’/*5(n;  fc)’s, 
which  are  related  to  6(n;fc)’s  within  a  constant  scale. 
We  refer  to  the  solution  based  on  (21)  as  Algorithm  1. 

3.2.  Normal  Equation  based  on  All  Slices  of  TV- 
HOC’s 

From  (16)  and  (17),  b{n  +  Ti-,k  +  Ti)  can  be  expressed 
as 

b{n  -t-  Tj;  A:  -I-  r^)  =  liJ'^^bi{n  -I-  Tj;  A:  -I-  r*)  (26) 

where 

bi{n  +  Ti\k  +  Ti) 

c?.(n-A;;0,---,0,g,0)|^^' 
cj2(n-A:;0,  •■■,0,g,g)  / 

Ajj  6,  *  *  ’ ,  0, g,  Aj  -l-  Ti) )  (27) 

cizin  -  k-,0,- ■  •  ,0,q,0)  ) 

By  substituting  (29)  and  (30)  into  (8)  and  denoting 

i-i 

ei{n-,  k]  Ti,  •  •  •  ,Ti-i)  =  JJ  bi{n  -f  A  -H  Tj),  (28) 

i=l 

it  follows  that 

ciz{n;Ti,---,Ti-i) 

=  T,-rL''H  n-,k)ei{n-,k-,Ti,  -  ■  ■  ,Ti-i), 
k=0 

-  q  <Ti  ■<  q,i  =  1,- •  •  ,l  -  1-  (29) 
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Equation  (29)  is  referred  to  as  the  normal  equa¬ 
tion  based  on  all  slices  of  TV-HOC’s  for  time-varying 
MA(^)  system  identification. 

Concatenating  (29)  for  ,  0,  •  •  * ,  g,  and  i  = 

1  1,  we  obtain  an  overdetermined  set  of  linear 

equations,  which  includes  that  given  by  (22),  Since  the 
rank  of  the  coefficient  matrix  in  (22)  is  g  +  1,  the  coef¬ 
ficient  matrix  based  on  (29)  is  of  rank-(g-f  1)  too.  This 
means  that  equation  (29)  gives  a  unique  LS  solution 
(within  some  constant  scale).  We  refer  to  the  solution 
based  on  (29)  as  Algorithm  2. 

Several  remarks  are  in  order. 

Remark  1  :  Equations  (21)  and  (29)  hold  for  a  gen¬ 
eral  time- varying  MA(g)  model.  However,  to  imple¬ 
ment  these  schemes  in  practice,  it  is  necessary  to  es¬ 
timate  the  involved  TV-HOC’s  first,  which  may  not 
be  available  from  only  one  record  of  data.  Fortu¬ 
nately,  for  an  APMA  system,  the  output  TV-HOC’s 
can  be  computed  via  sample  average  technique  from 
only  one  record  of  data,  as  discussed  in  Section  II, 
which  makes  it  possible  to  identify  the  time-varying 
impulse  response  coefficients. 

Remark  2  :  For  time-invariant  MA(g)  system  identifi¬ 
cation,  a  variety  of  linear  algebra  solutions  have  been 
developed  by  using  the  overparameterized  techniques. 
However,  these  techniques  can  not  be  applied  to  the 
time- varying  system  identification  because  the  number 
of  unknowns  may  be  greater  than  the  number  of  equa¬ 
tions  available  for  such  case.  On  the  other  hand,  our 
two  algorithms  are  well  parameterized  in  the  sense  that 
only  6(n;  fc)’s  are  estimated. 

Remark  3  :  Algorithm  1  exploits  the  diagonal  slice  of 
TV-HOC’s  only,  however,  Algorithm  2  uses  all  slices  of 
TV-HOC’s.  By  using  the  symmetry  property  of  TV- 
HOC’s,  it  is  possible  to  reduce  the  number  of  equa¬ 
tions  in  Algorithm  2.  In  addition,  it  is  widely  recog¬ 
nized  [1]  that  additional  cumulant  slices  may  be  ex¬ 
ploited  to  improve  the  performance  of  parameter  esti¬ 
mates.  Therefore,  Algorithm  2  is  expected  to  outper¬ 
form  Algorithm  1. 

4.  Simulations 


was  defined  as  SNR=  101ogio(£^{£^(n)}/£{n;2(n)}). 
The  accuracy  of  system  identification  is  assessed  by 
calculating  the  mean  square  error  (MSE) 


MSE  =  lOlogio  'I  (30) 


where  S(n;  i)  is  the  estimated  system  parameter. 
Example  1:  The  system  impulse  response  coefficients 
were  given  by  [2] 

6(0;fc)  =  [1,0.1, -0.2],  (31) 

6(1;A:)-[1.5,0.7,0.4].  (32) 

The  data  length  was  chosen  to  be  —  2  x  4096.  Ta¬ 
ble  I  summarizes  the  mean  ±  standard  deviation  of  the 
parameter  estimates  for  n  =  0  and  n  =  1  at  SNR=0dB 
obtained  by  DG-2  approach  based  on  (11)  and  this  pa¬ 
per’s  two  new  algorithms  (/  =  3). 

Example  2:  The  system  impulse  response  coefficients 
were  given  by 

6(0;  k)  =  [1, 0.9, 0.385,  -0.771],  (33) 

6(1;  A;)  =  [1,-1.967,0.5666,0.2].  (34) 

The  data  length  was  chosen  to  be  A/^  =  4  x  4096. 
Table  II  summarizes  the  mean  ib  standard  deviation 
of  the  parameter  estimates  for  n  =  0  and  n  =  1  at 
SNR=10dB  obtained  by  DG-2  approach  and  this  pa¬ 
per’s  two  new  algorithms  {I  =  3). 

From  the  parameter  estimation  results  shown  in  Tables 
I  and  II,  we  have  the  following  observations: 

(1)  From  the  MSE’s,  our  two  new  algorithms  perform 
better  than  DG-2  approach.  This  is  because  that  DG-2 
approach  is  sensitive  to  the  estimation  errors  of  TV- 
HOC’s,  however,  our  new  algorithms  are  much  robust 
to  this  kind  of  errors. 

(2)  The  estimates  obtained  via  Algorithm  2  are  bet¬ 
ter  (in  terms  of  bias  and  deviation)  than  the  estimates 
obtained  via  Algorithm  1.  This  is  expected  since  the 
set  of  the  output  statistics  exploited  by  Algorithm  2 
is  much  larger  than  the  data  of  the  output  statistics 
exploited  by  Algorithm  1;  see  Remark  3. 


Two  simulation  examples  are  presented  to  illustrate 
the  performance  of  our  new  algorithms  for  TVMA  sys¬ 
tem  identification.  Here,  b{n]k)  was  chosen  to  be  a 
periodically  time-varying  filter  with  period  2,  thus  the 
cyclic  frequencies  are  a  =  0  and  a  =  tt,  and  100  Monte 
Carlo  runs  were  performed  with  each  consisting  of  N 
data  samples.  In  each  run,  the  input  £{n)  was  chosen 
to  be  i.i.d.  exponential  with  73^  =  1,  and  the  measure¬ 
ment  noise  w{n)  was  a  zero-mean  Gaussian  AR  process 
with  poles  at  0.8ij0.2.  The  signal-to-noise  ratio  (SNR) 


5.  Conclusions 

In  this  paper,  we  have  established  two  normal  equa¬ 
tions  for  TVMA  system  identification  and  have  devel¬ 
oped  two  linear  algebraic  algorithms  whose  uniqueness 
is  guaranteed  (within  some  constant  factor).  These 
methods  use  only  higher  order  cumulants  and  conse¬ 
quently  yield  consistent  parameter  estimation  in  the 
presence  of  additive  colored  noise  with  vanishing  higher 


115 


order  cumulants.  Simulations  have  shown  that  the  two 
new  algorithms  have  better  performance  than  the  ex¬ 
isting  closed-form  algorithms. 


References 

[1]  J.M,  Mendel,  “Tutorial  on  higher-order  statistics 
(spectra)  in  signal  processing  and  system  theory: 
Theoretical  results  and  some  applications,”  Proc. 
IEEE,  vol.79,  pp.278-305,  Mar.  1991. 

[2]  A.  Dandawate  and  G.B.  Giannakis,  “Modeling 
(almost)  periodic  moving  average  processes  using 
cyclic  statistics,”  IEEE  Trans.  Signal  Processing, 
vol.44,  pp.673-684,  Mar.  1996. 

[3]  X,-D.  Zhang  and  Y.-S.  Zhang,  “FIR  system  identi¬ 
fication  using  higher  order  statistics  alone,”  IEEE 
Trans.  Signal  Processing,  vol.42,  pp. 2854-2858, 
Oct.  1994, 

[4]  X.-D.  Zhang  and  Y.-S.  Zhang,  “Singular  value 
decomposition-based  MA  order  determination  of 
non-Gaussian  ARMA  models,”  IEEE  Trans.  Sig¬ 
nal  Processing,  vol.41,  pp. 2657-2664,  Aug.  1993. 

[5]  A.  Dandawate  and  G.B.  Giannakis,  “Statisti¬ 
cal  tests  for  presence  of  cyclostationarity,”  IEEE 
Trans.  Signal  Processing,  vol.42,  pp. 2355-2369, 
Sept.  1994. 


TABLE  II 

Parameter  Estimation  Statistics  for  Example  2: 
N  —  4  x  4096,  SNR=10dB,  100  Monte  Carlo  Runs 


Parameters 

DG-2 

Algorithm  1 

Algorithm  2 

6(0;  0)  =  1.0 

0.0399 

1.2573 

1.0994 

(±1.2045) 

(±0.7006) 

(±0.5694) 

6(0;  1)  =0.9 

0.8884 

0.8482 

0.8619 

±(0.1058) 

(±0.3279) 

(±0.0881) 

6(0;  2)  =  0.385 

-0.1239 

0.5268 

0.4475 

(±0.6297) 

(±0.3570) 

(±0.2465) 

6(0;  3)  =  -0.771 

-0.7750 

-0.7936 

-0.7654 

(±0.0580) 

(±0.2497) 

(±0.0670) 

i)(l;0)  =  1.0 

1.0052 

1.0285 

0.9445 

(±0.0882) 

(±0.7053) 

(±0.2202) 

6(1;  1)  =  -1.967 

-0.0603 

-2.5787 

-2.2176 

(±2.3915) 

(±1.4547) 

(±1.1386) 

6(1;  2)  =  0.5666 

0.5800 

0.6793 

0.5978 

(±0.0947) 

(±0.8961) 

(±0.1376) 

6(1;  3)  =  0.2 

0.2945 

0.2034 

0.2095 

(±0.1860) 

(±0.3212) 

(±0.1274) 

MSE 

-2.0839 

-12.1293 

-19.7690 

[6]  D.R.  Brillinger,  Time  series:  Data  Analysis  and 
Theory,  Holden-day  Inc.,  San  Francisco,  1981. 


TABLE  I 

Parameter  Estimation  Statistics  for  Example  1: 
N  —  2  x  4096,  SNR=0dB,  100  Monte  Carlo  Runs 


Parameters 

DG-2 

Algorithm  1 

Algorithm  2 

6(0;  0)  =  1.0 

6(0;  1)  =  0.1 

6(0;  2)  =  -0.2 

6(1;  0)  =  1.5 

6(1;  1)  =0.7 

6(1;  2)  =  0.4 

0.4474 
(±  0.9678) 
0.1032 
(±  0.1377) 
-0.2453 
(±0.1254) 
1.2956 
(±0.9438) 
0.0675 
(±0.7105) 
0.1745 
(±0.1305) 

1.1358 
(±  0.7106) 
0.1006 
(±0.0608) 
-0.0884 
(±0.4480) 
1.5039 
(±0.4643) 
0.7934 
(±0.4621) 
0.3977 
(±0.1366) 

1.0385 
(±  0.5385) 
0.1103 
(±0.0675) 
-0.1672 
(±0.1745) 
1.4383 
(±0.3876) 
0.7515 
±(0.3933) 
0.4057 
(±0.1443) 

MSE 

-8.9167 

-19.9852 

-26.3467 
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Abstract 

A  method  for  identification  of  discrete  nonlinear  ^s~ 
terns  in  terms  of  the  Volterra-Wiener  series  is  presented. 
It  is  shown  that  use  of  a  special  composite-frequency 
input  signal  as  approximation  to  Gaussian  noise  pro¬ 
vides  a  computational  efficiency  of  this  method,  espe¬ 
cially  for  high  order  kernels.  Orthogonal  functionals  and 
consistent  estimates  for  Wiener  kernels  in  the  frequency 
domain  are  derived  for  this  class  of  noise  input  A  basis 
of  the  proposed  computational  procedure  for  practical 
identification  is  the  fast  Fourier  transform  (FFT)  algo¬ 
rithm  which  is  used  both  for  generation  of  actions  and 
for  analysis  of  system  reactions. 


1.  Introduction 

Continued  interest  has  been  shown  in  the  use  of  func¬ 
tional  series  in  the  modeling,  identification  and  control 
of  nonlinear  systems  [1-6]  since  the  initial  work  by  Wie¬ 
ner  [1].  He  considered  the  class  of  causal  systems  that 
produce  an  output  with  finite  mean-square  value  when 
their  input  is  a  Gaussian  white  noise.  The  output  y{t)  of 
an  unknown  nonlinear  "black-box"  system  can  be  ap¬ 
proximated  by  a  series  of  functionals  Gfn[hm.x{t)]  of  the 
input  x{t)  as 

M 

=  (1) 

/71  =  0 

where  h^n  is  the  Wiener  kernel  of  the  order  m. 

A  main  difficulty  encountered  when  one  wants  to  ap¬ 
ply  the  Wiener  approach  to  identification  problems  in¬ 
volves  the  measurement  of  the  kernels  which  is  compu¬ 
tationally  demanding.  Several  methods  have  been  pre¬ 
sented  to  find  the  Wiener  kernels  of  nonlinear  systems 
from  given  input  and  output  pairs.  Lee  and  Schetzen  [7] 
showed  that  the  kernels  can  be  estimated  by  input-output 
crosscorrelation.  French  and  Butz  [8]  used  the  fast 
Fourier  and  Walsh  transform  algorithms  for  calculation 


of  the  Wiener  kernels.  However,  there  are  some  difficul¬ 
ties  involved  in  these  methods: 

•  A  white  Gaussian  process  is  unrealizable. 

•  Formula  for  kernels,  m>2,  involves  Dirac  delta  func¬ 
tions,  when  two  or  more  kernel's  arguments  are  equal. 

•  The  required  computation  increases  veiy  rapidly  with 
the  order  of  the  Wiener  kernel  being  calculated. 

This  paper  will  resolve  these  difficulties  by  investigat¬ 
ing  discrete  systems  with  special  type  of  noise  inputs 
generated  using  the  FFT  algorithm.  We  will  construct  the 
G-functionals  for  such  inputs,  and  the  formula  for  the 
Wiener  kernels  and  the  efficient  identification  algorithm 
in  the  frequency  domain  will  be  presented. 

2.  Forcing  functions  for  nonlinear  systems 
testing 

It  is  necessary  that  the  system  stimulus  must,  on  the 
one  hand,  be  like  a  random  noise  to  get  maximum  infor¬ 
mation  about  unknown  system  and,  on  the  other  hand,  to 
simplify  the  G-fimctionals  and  the  procedure  of  identifi¬ 
cation  on  the  whole.  Taking  into  consideration  these 
circumstances,  let  us  consider  as  a  test  input  the  follow¬ 
ing  periodic  noise  approximation 

xin)=  ^  X(k)cxpj^.  (2) 

Here  X(k)^A(k)cp{k)  are  complex  Fourier  coefficients, 
where  the  amplitudes  determine  the  power  spectrum 
of  the  input,  and  the  phases  (p(k)  are  independent  random 
values  with  uniform  distribution. 

For  a  zero  mean  real  signal,  the  complex  valued 
Fourier  coefficients  have  the  following  relationships: 
X(0)=0,  X(-k)=X*(k).  According  to  the  Central  Limit 
Theorem,  tihie  signal  in  the  form  of  (2),  being  a  sum  of 
independent  random  quantities,  has  a  nearly  Gaussian 
distribution  for  sufficiently  large  Nx.  For  every  set  (p/(/:)  of 
the  random  phases,  formula  (2)  determines  the  sequence 
xi(n)  having  N  samples  long  which  may  be  formed  by  the 
inverse  FFT  of  the  coefficients 
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3.  G-functionals  and  Wiener  kernels  in  the 
frequency  domain 


According  to  the  proposed  method  of  the  test  signal 
generation,  the  random  input  process  x{ri)  is  determined 
by  the  set  of  input  Fourier  coefficients  X{k),  k=0,...J^x, 
and  the  corresponding  response  >-(«)  can  be  characterized 
by  the  set  of  output  Fourier  coefficients  Y(k),  Ic=0,...,Ny. 
Therefore,  it  is  possible  to  rewrite  input-output  relation¬ 
ship  (1)  for  nonlinear  systems  in  the  frequency  domain  as 

M 

YAk)  =  l,GJH„,X(k)]  (3) 

m=0 

where  HJkx,...,kn)  is  the  multidimensional  discrete 
Fourier  transformation  (DFT)  of  the  Wiener  kernel 
hjnx,...,n„). 

By  using  a  Gram-Schmidt  orthogonalization  proce¬ 
dure,  the  G-functionals  can  be  shown  to  be 

G,„[H^,X{k)]  =  (4) 

Dm 

where  the  summation  must  extend  over  the  m-D  region 
D,n  consisting  of  various  combinations  from 

integers  }  such  that  k\  >k2  >..>km,  and 

kii^-kj,  and  S-j  is  a  Kronecker  delta. 

The  Wiener  kernels  in  the  frequency  domain  for  the 
model  (3)  can  be  determined  by  minimization  of  the 
mean  square  error  between  the  DFT  Y{k)  of  the  system 
and  YjJJi)  model  responses 

F  =  ^  min 

where  A^  =  is  the  vector  of  the  complex  er¬ 
rors  having  the  elements  ^{•}  denotes  the 

average  operation,  and  *  represents  the  complex  conju¬ 
gate. 

Minimizing  this  function,  the  optimal  Wiener  kernels 
become 

=  - 1 - ; - -  . 

"  ‘  "  A\k,y.rA\kj 


In  order  to  construct  an  estimate  of  kernel 
which  would  be  suitable  in  practice  let  us  introduce  the 
periodogram 


m 

=  F'(^i+-+*„)exp[-/£cp7(*)]  •  (5) 
1=1 

Then  we  can  get  the  estimate  by  averaging  L  periodo- 
grams  for  different  blocks  of  data  as  follows 

L 


A\k,)-...-A\k,„)  • 


(6) 


4.  Identification  algorithm 

If  the  random  phases  cp(^)  are  formed  by  random 
sampling  from  set  of  numbers  2%rlR,  r=Q,...JR-\,  the 
equation  (5)  for  the  periodogram  may  be  rewritten  as 

l‘y....i.K---,K)=Y\kx^..Ak,„) 

xexp|^-/^{4,+-+-si„}modi?j  (7) 

where  Sk  is  /-th  set  of  random  integers,  and  {‘Imod  R 
denotes  summation  defined  modulo  R. 

The  calculation  of  the  Wiener  kernels  for  order  m>l 
may  be  performed  more  effectively  if  it  is  noted  that  pe¬ 
riodogram  (7)  may  take  limited  number  of  values 
Yi(k)exp-J2nilNx,  k=0,...,Ny,  This  allows  us  in 

advance  to  form  the  array  of  possible  products  for  every 
DFT  Yi{k).  Thus,  the  algorithm  of  identification  consists 
of  the  following  steps: 

l.Generation  of  the  random  integers  ^nd 

forming  the  complex  Fourier  coefficients 
l.Calculation  the  /-th  block  of  the  input  signal  by  using 
of  the  inverse  FFT  algorithm 

x,{n)=  £  X,(k)expj——,  n=0 . N-l. 

k=-N^  ^ 

3. Stimulation  of  the  system  by  the  input  signal  xi{n)  and 
registration  of  the  response  y/(w). 

4.  Calculation  the  complex  Fourier  coefficients  Yi{k)  by 
use  the  FFT  algorithm 

Y,(k)  =  —  %y,(,ri)Qxi^-J^^,  k=0,...,Ny. 

5.  Definition  of  the  array  Zi{k,i)  of  all  the  possible  values 
of  the  periodograms 

Zi(kJ)=Yi(k)Qxp-J2nilR,  k=0,..,,Ny, 

6.  Forming  the  periodograms  from  the  array  Zi(k,i) 

=  (^1  +•  ■  > {4,  +■  •  } mod R) . 

7. Calculation  of  the  kernel  estimates  using  eqn.(6). 

This  algorithm  of  the  kernels  estimation  bases  also  on 
idea  of  scanning  the  definition  regions  so 

that  the  partial  sums  kx-^.^.-^km  and  {5^^ +...+5’^^} mod 7^ 

obtained  for  w-order  kernel  estimate  could  be  used  for 
calculation  of  the  (w+l)-order  periodogram. 

Next,  we  will  describe  a  simple  recursive  method  for 
generating  of  the  domains  Dm.  The  one-dimensional 
frequency  domain  consists  of  discrete  frequencies 
Di={0,l,...,M2}.  The  two-dimensional  frequency  domain 
consists  of  the  sum  interaction  domain 

D;  ={(^,,A:2):  k^  <k^<Nl2-k^} 
and  the  difference  interaction  domain 

D;={(-k„k,):k,<k,^Nl2}. 
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In  the  general  case,  w-dimensional  frequency  domain 
can  be  generated  by  (ffj-l)-dimensional  one.  The  sum  D,^ 
and  the  difference  interaction  domains  are  expressed 
as  follows: 

-ki)<k„<  Njl,  0} 

for  all  frequency  points  {ki,...,k,„.\)^m.\-  Here  de¬ 
notes  the  sum  of  indices  k\+...+km.  The  whole  domain 
is  formed  as  imion  of  these  two  domains:  D„=Dm[}Dm. 
Note  that  for  every  new  generated  point  {k\,...,k„)  of  the 
frequency  domain  the  indices  are  ordered  such  that 
kx^2^-^m- 

Since  the  number  of  the  combinations  (k],...,k^ 
containing  in  the  kernel  definition  region  D„  increases 
very  rapidly  with  order  m  of  the  kernel,  and  the  number 
of  multiplications,  demanding  for  calculation  of  the  array 
Zi(k,i)  of  all  possible  values  of  the  periodograms,  does  not 
depend  on  m,  the  proposed  algorithm,  as  compared  with 
methods  [7,  8],  allows  one  to  decrease  the  number  of 
multiplications  to  a  marked  degree. 

Actually,  the  most  effective  method  [8]  demands  ap¬ 
proximately  Z.(Ci-i-2C2+...+wC„)  complex  multiplication, 
while  as  the  proposed  method  requires  only  LR(Ny+l)l2 
multiplications.  For  a  comparison,  in  Table  we  give  a 
reduction  factor  of  multiplications  for  Nx=Ny,  R=S  and 
various  values  m  of  the  kernel  order.  Thus  the  computa¬ 
tional  efficiency  rapidly  increases  with  the  kernel  order. 


Table.  Reduction  factor  of  multiplications 


Kernel 
order  m 

The  number  Ny  of  output  frequencies 

16 

32 

64 

128 

1 

1 

1 

1 

1 

2 

3 

6 

12 

24 

3 

26 

no 

447 

1804 

5.  Statistical  properties  of  kernel  estimates 

Averaging  the  estimate  (6)  yields 

Therefore  the  estimate  is  unbiased. 

A  mean  square  error  of  the  kernel  estimate  is  given  by 
the  variance 

yzx[H„{k„...,kj]  (8) 

L  L  f  \ 

4=i/2=i  /  1=1 

Taking  into  account  the  independence  of  the  random 
phases  (p(k),  for  covariance  we  obtain 


0  5*  4 

E{|r,(*.+...+k„f  }-|/7„(k.,....k„f  /.  =  4  =  /. 

Substituting  (9)  into  (8),  we  have 

-lffjk„...,kjt)  (10) 

where  S/Zc)  is  the  spectrum  of  the  output  process  y(n). 
Since  the  variance  (10)  goes  to  zero  as  L~>oo,  the  esti¬ 
mate  is  consistent. 

In  order  to  clarify  sense  of  the  derived  expression  (10), 
let  us  take  advantage  of  using  the  following  formula  for 
output  spectrum 

Sy{k)  =  Nfj\H„(k,,...xf 

m 

(11) 

Then  output  spectrum  at  the  point  (/:!+...+/::„)  niay  be 
represented  as  sxun 

m 

=  n^'(*.) 

1=1 

^s‘;{k,^...^k„) .  (12) 

Here  the  first  term  is  caused  by  kernel  HJlii,...,k^  at  the 
point  (^i,...,fcm),  and  the  second  -  by  values  of  the  kernel 
in  the  rest  points. 

After  substituting  (12)  into  (10),  we  obtain 
ysr[H„(k„...,kj)  =  S^^{k,+...+kjj NLf{A\k,) . 

Thus,  the  variance  of  the  estimate  at 

the  point  (^i+...+A:m)  is  directly  proportional  to  the  output 
spectrum  except  for  the  component  caused  by  the  value  of 
the  kernel  being  estimated.  The  less  this  value  contrib¬ 
utes  to  the  total  spectrum,  the  more  error  is  involved.  The 
estimates  have  tendency  to  be  worse  as  nonlinearity  be¬ 
comes  stronger  since  the  output  spectrum  enriches  by 
new  frequency  components.  If  the  system  is  linear,  the 
output  spectrum  Sy{k)  is  entirely  defined  by  the  first  order 
kernel  H\{k).  Therefore,  in  this  case  variance  of  estimate 

/fj  {k)  is  equal  to  zero. 

For  modeling  of  the  input-output  power  relationship 
by  (11),  kernel  magnitudes  squared  is  needed.  But  using 

as  the  estimate  of  \H^{k^,...,kJ\  can 

leads  to  large  errors  because  one  becomes  biased  for  fi¬ 
nite  L. 
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Possibly  the  best  estimate  is 

\haK^-,K)L 

=  \H„{k„...Xn)\  -Var{^„(/:, (13) 
Substituting  (10)  into  (13),  one  can  be  defined  as  follows 

/  ^ 

-S^{k,+...+kjl  N{L-\)^A\k,)  (14) 

where  Sy(k)  =  — ^|l/(^)|  *5  an  estimate  of  S/k). 

L  /=i 

It  can  be  shown  that  the  estimate  (14)  is  unbiased  and, 
therefore,  one  is  more  suitable  for  modeling  spectral 
relations  in  nonlinear  systems. 

6.  Experimental  results 

In  order  to  test  the  results  described  in  the  previous 
sections  Matlab  programs  have  been  written  on  the  basis 
of  the  proposed  identification  algorithm. 

The  method  developed  in  this  paper  has  been  tested  by 
applying  two  kinds  of  inputs  to  the  known  system  in 
fig.l:  one  is  a  zero-mean  Gaussian  input,  and  another  is 
the  composite-frequency  input  signal  in  the  form  (2).  The 
kernels  calculated  by  Wiener  method  are  compared  to 
those  obtained  by  the  method  developed  in  this  paper. 
Both  methods  give  approximately  identical  estimates  of 
the  kernels  in  the  frequency  domain.  However,  compari¬ 
son  these  methods  with  respect  to  computation  time  on 
IBM  PC  shows  a  high  speed  of  proposed  algorithm  and 
confirms  the  theoretical  calculation  given  above  in  the 
table. 

The  estimates  of  the  first  three  kernels  of  known  sys¬ 
tem  has  been  calculated  by  averaging  Z,=700  blocks  of 
data  containing  N=\1Z  points  each  according  (6).  For 
stimulation  the  system,  the  test  signal  in  the  form  (2) 
with  R=\(>  random  phases  and  7V*=20  spectral  samples 
was  used.  Using  obtained  kernel  estimates,  relevant  con¬ 
tributions  to  the  total  response  were  calculated.  This  is 
shown  in  Fig.  2.  The  effect  of  the  addition  of  nonlinear 
contribution  was  analyzed  by  using  a  mean  squared  error 
test.  We  had  the  following  averaged  reduction  for  the 
sequence  of  the  models;  linear  >’i(n)  -  39.6,  quadratic 
y2{n)  -  17.5  and  cubic73(M)  -  15.4. 

Figs.  3-4  show  the  first  and  second  order  kernel  esti¬ 
mates,  respectively.  Note  that  nonlinearity  leads  to  larger 
errors,  especially,  for  those  frequency  components  which 
bring  in  a  negligible  contribution  to  the  output  spectrum. 

For  modeling  the  spectral  input-output  relationship, 
the  kernel  magnitudes  squared  have  been  also  calculated 
by  using  (14).  These  estimates  were  used  to  calculate  the 
output  spectrum  Sm(k)  of  the  model  and  its  components 


Figure  1.  Known  nonlinear  system  used  fortesting: 
aii=-1.5681.  ai2=0.64,  bii=-1.  a2i=-1.2895, 

322=0.49,  Ci=0.1,  C2=0.03,  C3=0.001 


Ha) 


Figure  2.  Simulation  results: 

(a)  input  signal  x(/i);  (b-d)  actual  output  signal  y(n) 
and  model  components  yi(n),  y2(o),  Ysin). 

S„x(k),  S„2(k),  S„a(k)  contributed  by  the  first  three  ker¬ 
nels.  Fig.  5  shows  these  spectral  estimates  together  with 
actual  values  S/k),  Syx(k),  Sy^k),  Sy^ik)  obtained  by  ana¬ 
lytical  way.  Note  that,  obtained  spectral  estimates  are 
unbiased  and  approximate  the  actual  spectra  fairly  well. 
The  model  spectrum  is  also  close  to  the  spectral  estimate 
Sy(k)  obtained  from  output  data  directly. 

The  first  order  kernel  gives  a  basic  contribution  to  the 
output  spectrum,  and  it  can  be  evaluated  most  exactly. 
Since  power  of  the  quadratic  term  Sy^k)  is  mainly  con¬ 
centrated  at  the  low  frequencies,  the  errors  of  the  esti¬ 
mate  H^ik^,k^)  appear  to  be  stronger  as  far  as  {kx+k2) 
goes  towards  high  frequency  region.  The  third  term 
Sy2{k)  contributes  the  less  power  to  the  output  spectrum 
and  therefore  one  seems  to  be  the  most  corrupted. 

8.  Conclusion 

An  algorithm  for  identification  of  discrete  nonlinear 
systems  in  terms  of  the  orthogonal  series  was  presented. 
The  process  generated  by  inverse  FFT  algorithm  was 
used  as  a  test  signal.  For  this  input  the,  G-fimctionals 
and  Wiener  kernels  were  defined  in  the  frequency  do¬ 
main.  The  proposed  algorithm  offers  a  significant  reduc¬ 
tion  in  computational  complexity  compared  with  the 
known  methods  since  the  number  of  multiplications  does 
not  depend  on  the  kernel  order. 
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Sy2  ((O),  >S^(Ci)) 


Figure  3.  Magnitude  of  the  first  order  kernel: 
(a)  actual,  (b)  estimated. 
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Figure  4.  Magnitude  of  the  second  order  kernel: 

(a)  actual  value;  (b)  kernel  estimate;  (c)  contour  plot 
of  the  kernel  estimate. 
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Figure  5.  Modeling  of  the  output  spectral  compo¬ 
nents. 
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Abstract 

Second- Order  Statistics  (SOS)  have  been  widely  used 
for  the  detection  and  estimation  of  coherent  sinusoids 
in  additive  wide-band  noise.  This  paper  addresses  the 
detection  and  estimation  of  harmonics  which  have  been 
corrupted  by  both  multiplicative  and  additive  noise. 
HOS  are  useful  in  estimating  harmonics  of  zero  mean 
amplitude  where  SOS  generally  fail  The  paper  anal¬ 
yses  and  compares  the  performance  of  SOS  and  HOS 
when  the  harmonic  has  both  coherent  and  non  coherent 
powers.  We  determine  thresholds  on  the  coherent-to- 
non  coherent  sine  wave  power  ratio  which  delimitate 
the  regions  of  optimality  of  SOS  and  HOS.  Gaussian 
as  well  as  non  Gaussian  noise  sources  are  studied. 


1  Introduction 

Harmonic  retrieval  is  one  of  the  classical  and  impor¬ 
tant  problems  in  statistical  signal  processing.  The  large 
existing  literature  mostly  addresses  harmonics  in  addi¬ 
tive  noise.  Multiplicative  noise  models  occur  in  sev¬ 
eral  applications,  e.g.  underwater  propagation  signals 
[2],  radar  signals  [1],  fading  communication  channels 
[8]  (see  [7]  for  other  applications). 

Consider  a  discrete-time  harmonic  in  multiplicative 
and  additive  noise 

=  +  +  t  =  0,...,N-l  (1) 

with  the  following  assumptions:  ASl)  A  is  a  positive 
deterministic  constant;  AS2)  b{t)  is  a  real  white  multi¬ 
plicative  noise  with  variance  and  fcth-order  moment 
rrikb,  k>2\  ASS)  u  and  (j)  are  the  harmonic  frequency 
and  phase,  assumed  constant  in  the  range  (0,7r)  and 
a  uniformly  random  variable  over  (— 7r,7r]  respectively; 
ASA)  v{t)  is  a  complex  white  additive  noise  with  vari¬ 
ance  (j^  and  A:th-order  absolute  moment  ruku^  k  >  2; 


A55)  the  multiplicative  and  additive  noise  sources  are 
mutually  independent. 

Several  techniques  based  on  Second-Order  Statistics 
(SOS)  have  been  proposed  for  harmonic  retrieval  in  the 
additive  noise  case,  i.e.  A  /  0  and  cr|  =  0.  More  re¬ 
cently,  some  authors  (e.g.  [6])  advocated  the  use  of 
HOS  when  the  additive  noise  is  colored  and  Gaussian. 
This  is  motivated  by  the  ability  of  HOS  to  suppress  the 
effects  of  this  kind  of  noise.  On  the  other  hand,  there 
is  no  motivation  to  use  HOS  when  the  Gaussianity  as¬ 
sumption  is  unsatisfied. 

For  the  noise  model  (1),  SOS  are  useless  for  har¬ 
monic  retrieval  if  A  =  0.  HOS  are  then  required  pro¬ 
vided  (7^  ^  0.  Fourth-order  statistics  have  been  shown 
to  be  an  efficient  tool  for  this  particular  harmonic  re¬ 
trieval  problem  [7]  [3]. 

Both  SOS  and  HOS  are  useful  for  harmonic  retrieval 
in  the  general  case  where  A  0  and  ^  0.  So,  the 
following  question  must  be  posed:  given  a  multiplica¬ 
tive  and  additive  noisy  environment,  which  statistics 
are  optimal  for  the  detection  and  estimation  of  hidden 
periodicities.  The  goal  of  this  paper  is  to  provide  statis¬ 
tical  tools  that  answer  this  question.  We  will  show  that 
HOS  outperform  SOS  when  the  multiplicative  noise 
power  exceeds  a  threshold  which  depends  upon  A  and 
the  noise  distributions. 

2  Harmonic  Retrieval  using  SOS  and 
HOS 

Under  assumptions  AS2-ASb  and  A  —  0,  the  mo¬ 
ments  and  cumulants  of  x{t)  are  identically  zero  except 
over  a  finite  set  of  hyperplanes.  In  this  paper,  we  focus 
on  the  fourth-order  statistics.  The  fourth-order  mo¬ 
ment  is  non-zero  only  along  the  1-D  slice  m4a;(0,r,  r). 
Since  the  additive  noise  is  not  necessarily  Gaussian,  cu¬ 
mulants  are  not  considered  here.  Thus,  the  comparison 
of  SOS  and  HOS  performances  will  be  based  on  m2x(r) 
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and  m4a;(0,  r,  r)  which  are  obtained,  in  the  general  case 
(i.e.  A,  ,  0-2  ^  0),  as 

'm2x(T)  =  E{x*{t)x{t  +  T)}  (2) 

=  A^€P--  +  {ctI  +  ctI)6{t) 
fTl4x(0,  T,r)  =  E{x*{t)x*{t)x(t  +  T)x{t  +  T)}  (3) 

=  {A^  +  affe^^'^^  +  [4A^al+ 

4Amsb  +  m46  -  (t* + 

4al{A‘^  +  o-fc)  +  m4x]  6{t) 

where 

mat  ^ 

77146  =  Elb^it)} 

The  spectrum  52a; (A)  and  trispectrum  slice  54x(0,  A,  A) 
peak  at  u)  and  2uj  respectively: 


oo 


52x(A) 

=  m2x(T)e-^'^" 

(4) 

=  A'^6{a  —  u))  +  crb+(Tl 

54x(0,A,A) 

00 

=  m4x{0,T,T)e~^^'" 

(5) 

=  +  cr|)  5(a  —  2a;)  + 

AA^crl  +  4A.m3fe  + 

4al{A^ al) m4v 


Note  that  the  m4x(0,  r,  r)-based  methods  require  a; 
to  be  in  (0, 7r/2)  in  order  to  avoid  the  aliasing  phe¬ 
nomenon. 

It  is  worth  noting  that  77243,  (0,  r,  r)  is  the  covariance 
fimction  of  x^(t).  The  detection  and  estimation  of  a 
hidden  periodicity  in  (1)  can  then  be  performed  by 
applying  to  x{t)  or  x^{t)  one  of  the  numerous  tech¬ 
niques  of  detecting  and  estimating  harmonics  in  addi¬ 
tive  noise.  For  example,  the  harmonic  frequency  can 
be  estimated  using  the  peak  picking  technique.  That 
is,  cj  can  be  estimated  consistently  by 

^sos  =arg  max  S2x  (A)  (6) 

A>0 

provided  A  7^  0;  or 

i^HOS  =  X  arg  max  S^x  (0,  A,  A)  (7) 

^  A>0 


provided  A^O  or/and  ^  0,  where 

54x(0,A,A)  - 


(8) 

(9) 


which  are  simply  the  periodograms  of  x{t)  and  x‘^{t) 
respectively. 

3  SOS  versus  HOS 


The  signal  xt  can  be  written  as 
xt  =  si{t) ^i(t) 


(10) 


where 

si{t)  = 

and 

Ci(<)  = 

The  SOS-based  methods  regard  si{t)  as  the  desired 
signal  (coherent  sinusoid)  and  (t)  as  a  white  additive 
noise.  The  signal  and  noise  components  are  uncorre¬ 
lated.  Using  this  signal  and  noise  decomposition,  the 
SNR  relative  to  SOS  can  be  defined  as 


SNRsos 


g{ki(*)i"} 

A^ 

<^6 


(11) 


In  the  same  manner,  the  signal  and  noise  decompo¬ 
sition  of  Xf  is  given  by 

x^{t)  =  S2it)+Ut)  (12) 

where 

S2{t)  =  (A2-|-£rf)e^'(2<*;t+2<^)  (13) 

e2(t)  =  (62(t)-o2  +  2A6(t))e^(2“‘+2-^)  (14) 

+2ty{t){A  +  +  u\t) 


^2{t)  is  a  (wide-sense)  white  noise  and  decorrelated 
from  52  (t).  The  SNR  relative  to  HOS  is  then  obtained 
as 


SNRhOS  4  +4>lm3b+m4b  +2^2  (  )  4.^46,  -ctJ 

(15) 

For  both  SOS  and  HOS,  the  greater  the  SNR,  the 
larger  the  harmonic  detection  and  estimation  perfor¬ 
mance.  We  can  therefore  define  the  relative  efficiency 
of  HOS  with  respect  to  (w.r.t.)  SOS  as 


^  SNRhos 
^  SNRsos 


(16) 


Throughout  this  paper,  HOS  are  said  to  be  optimal  if 
p  >  1  and  SOS  are  optimal  otherwise. 
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Before  treating  the  multiplicative  noise  effects,  note 
that  p  is  uniformly  lower  than  1  in  the  pure  additive 
noise  case  (i.e.  erf  =  0): 

p  =  ^1  +  <  1 

where  K4u  is  the  kurtosis: 


m4i/ 


Hence,  SOS  are  the  optimal  statistics  in  this  case.  The 
following  proves  that  this  result  can  be  wrong  in  the 
presence  of  multiplicative  noise. 

Below,  we  limit  our  study  to  multiplicative  noise 
sources  whose  third-order  moments  vanish.  Such  con¬ 
dition  is  satisfied  for  symmetrically  distributed  multi¬ 
plicative  noise. 

3.1  Pure  Multiplicative  Noise 


To  study  the  relationship  between  the  threshold  6i 
and  the  noise  pdf,  the  generalized  zero  mean  Gaussian 
pdf  is  used.  It  is  a  symmetric  density  function  param¬ 
eterized  by  two  constants,  the  shape  parameter  a  >  0, 
and  the  scale  or  size  parameter  /3  >  0.  It  is  defined  by 

where  r(.)  is  the  gamma  function.  The  Gaussian  case 
is  obtained  for  a  —  2,  and  for  lower  values  of  a,  the 
shape  of  /  decays  at  a  lower  rate  than  in  the  Gaussian 
case.  Thus,  the  generalized  Gaussian  pdf  gives  den¬ 
sities  ranging  from  the  Gaussian  to  those  with  much 
faster  or  much  slower  rates  of  exponential  decay  of  their 
tails. 

The  corresponding  absolute  moments  are  given  by 

=  (22) 

K4b  is  then  found  to  be 


In  the  absence  of  additive  noise,  SNRsos  and 
SNRhos  can  be  rewritten  as 


SNRsos  = 


SNRhos 


4-^  +  K41,  —  1 


(18) 

(19) 


where  K46  is  the  kurtosis  of  the  multiplicative  noise; 


m4b 

^46  —  4 


Proposition  1.  HOS  are  optimal  w,r.t.  SOS  if  the 
coherent-to-non  coherent  sine  wave  power  ratio  satis¬ 
fies  the  following  condition 


4<^i 


(Cl) 


where 


K4b 


r(^)r(^) 

r2(f) 


(23) 


K4b  is  a  decreasing  function  of  a.  Thus,  the  perfor¬ 
mance  gain  using  the  HOS  increases  with  a.  Figure  1 
displays  9i  as  a  function  of  a. 

For  Gaussian  multiplicative  noise  (a  =  2),  (Cl)  be¬ 
comes 

4-  <  =  ^5  -  0.57  (24) 

al  v/3 

SNRhos  versus  SNRsos  is  shown  in  figure  2. 
The  SNRhos  remains  almost  constant  (=  0.5)  for 
SNRsos  ranging  from  0  to  0.57.  Figure  4  shows  that 
the  simulation  results  are  in  accordance  with  the  the¬ 
oretical  predictions.  The  triperiodogram  shce  (9)  pro¬ 
vides  superior  enhancement  of  the  line  spectrum  w.r.t 
the  periodogram  for  small  values  of  The  converse 

b 

is  true  when  ^  exceeds  0.57. 


3.2  Multiplicative  and  Additive  Noise  Sources 


01  =  i  ^(^^46  -  3)  -I-  \/ (k46  -  3)2  +  12^  □ 

01  is  a  monotonic  decreasing  function  of  K46.  Thus, 
the  lower  K46,  the  more  the  HOS  becomes  optimal.  The 
SOS  are  certainly  optimal  if  (using  K46  >  1) 

4  >  max(0i)  =  1  (20) 

whatever  the  multiplicative  noise  pdf. 


In  this  section,  we  study  the  influence  of  the  additive 
noise  on  the  above  optimality  results.  The  following 
generalizes  proposition  1. 

Proposition  2.  HOS  are  optimal  w.r.t  SOS  if  the 
coherent-to-non  coherent  sine  wave  power  ratio  satis¬ 
fies  the  following  condition 

^<d2  =  l{-7+V7+^)  (C2) 
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where 


7  —  (^41/“!*  +  2^  +  K4b  —  3  j  D 


For  Gaussian  multiplicative  and  additive  noise 
sources,  the  optimality  threshold  in  (C2)  can  be  rewrit¬ 
ten  as 


1^4.1 


/^  +  3 


Figure  3  displays  the  regions  of  optimality  of  SOS  and 

HOS.  It  appears  that  for  fixed  values  of  A  and 

the  additive  noise  does  not  significantly  degrade  the 

2 

optimality  performance  of  HOS  provided  ^  ^0.1. 


4  Conclusions 
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This  paper  has  considered  the  problem  of  detecting 
and  estimating  a  complex  sine  wave  in  both  multiphca- 
tive  and  additive  noise.  HOS  are  shown  to  be  prefer¬ 
able  to  SOS  in  certain  multiplicative  and  additive  noisy 
environments.  This  is  true  not  only  for  zero  mean  har¬ 
monic  amplitude,  but  also  for  values  of  the  mean  rang¬ 
ing  from  0  to  a  threshold  which  depends  on  the  ptoba- 
bility  density  functions  of  the  noise  sources.  The  choice 
of  the  optimal  statistics  is  crucial  for  short  data  record 
lengths.  It  is  also  shown  that  the  performance  gain 
using  SOS  increases  for  heavy-tailed  noise  pdfs.  The 
optimality  conditions  estabhshed  in  this  paper  are  also 
vahd  for  the  choice  between  the  cychc  mean  and  cychc 
variance  methods  proposed  in  [9]  and  [5].  More  gener- 
aUy,  this  paper  determines  which  input  is  best  (in  the 
sense  of  maximizing  the  SNR),  x{t)  or  to  present 
to  hnear  processors  (e.g.  FFT-based  detectors,  LMS 
adaptive  hne  enhancer  [4],  etc.).  Finaly,  our  results 
can  be  easily  extended  to  polynomial  phase  signals. 
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Fig.  1  :  Plot  of  threshold  6i  as  a  function  of  a. 
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Fig.  2  :  SNRhos  versus  SNRsos-  Gaussian 
multiplicative  noise 


Fig.  4  '  Normalized  periodogram  (left)  and 
triperiodogram  (right)  for  different  values  of  ^ 
Gaussian  multiplicative  and  additive  noise. 


Fig.  3  :  Regions  of  optimality  of  SOS  and  HOS  . 
Gaussian  multiplicative  and  additive  noise 
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Abstract 

In  this  paper,  a  new  Volterra  series-based  adaptive 
preprocessing  technique  is  presented  to  linearize  weakly 
nonlinear  systems  with  their  linear  parts  being  invertible. 
In  particular,  a  systematic  but  simple  design  procedure  is 
proposed  for  the  compensation  of  system  nonlinearities 
up  to  a  required  order,  yielding  substantial  reduction  of 
computation  burden.  For  the  performance  test  of  the 
proposed  approach,  some  simulation  results  are  also 
provided. 


1.  Introduction 

There  have  been  many  signal  processing  applications 
where  undesirable  nonlinearities,  inherent  in  a  system, 
might  degrade  the  overall  system  performance[l-5].  To 
compensate  for  such  unwanted  nonlinear  effects,  various 
linearization  methods  have  been  proposed  [2-5],  among 
which  the  Pth-order  inverse  is  one  of  the  commonly  used 
techniques.  Although  the  Pth-order  inverse  approach  has  a 
firm  theoretical  basis[l],  its  design  complexity  might  limit 
its  application  in  real  situations:  e.g.,  as  the  memory  size 
of  the  system  and  the  order  of  system  nonlinearities 
increase,  its  design  procedure  becomes  much  more 
complex.  To  relieve  such  a  complexity  in  the  design 
procedure,  an  adaptive  linearization  approach[2]  was 
proposed,  where  only  the  lowest-order  nonlinear  part, 
regarded  as  a  dominant  factor  degrading  the  overall 
system  performance,  is  scheduled  to  be  canceled  out  in  the 
total  system  output.  However,  even  in  case  of  such  a 
lowest-order  inverse  approach,  the  overall  system 
performance  still  depends  on  the  amount  of  remaining 
nonlinear  parts  in  the  system  output. 

In  this  paper,  a  new  adaptive  nonlinear  preprocessing 
technique,  based  on  Volterra  series,  is  proposed  for  the 
linearization  of  weakly  nonlinear  systems  with  their  linear 
parts  being  invertible,  where  (i)  its  design  procedure  is 
still  simple  and  (ii)  system  nonlinearities  up  to  a  certain 
order  can  be  compensated  for  in  a  systematic  way. 


The  approach  of  this  paper  is  based  on  the  extension  of 
the  lowest-order  inverse[2]  and  cascade  connection  of  J 
basic  preprocessing  (or  predistortion)  blocks,  leading  to 
substantial  reduction  of  computational  burden  (compared 
with  that  in  the  conventional  linearization  methods:  e.g., 
the  Pth-order  inverse). 

In  the  following  section,  the  problem  of  designing  a 
nonlinear  compensator,  based  on  Volterra  series,  is 
considered,  along  with  the  contraction  mapping  theorem. 
Then,  the  computational  complexity  required  in  the 
proposed  approach  is  discussed  in  section  3.  Finally,  to 
demonstrate  the  performance  and  applicability  of  the 
proposed  approach,  some  simulation  results  using  a 
baseband  satellite  communication  channel  model  and  a 
sample-and-hold  circuit  model  are  provided  in  section  4, 


2.  Design  of  a  Volterra  series-based 
preprocessor 


The  Nth-order  Volterra  series  representation  of  a 
weakly  nonlinear  system  can  be  given  by 


N  M-\  M-\  M-\ 


=  1  /.=0i,=0  1=0 


x[k- 1,  W/:  {x) 


(1) 


where  M  is  the  memory  size  of  the  system,  and  x{k]  and 
y[k]  denote  the  input  and  the  output,  respectively.  In 
addition,  h^{i^,...J^)  is  an  -order  Volterra  kernel, 
and  //„(•)  is  the  -order  Volterra  operator[l]. 

The  structure  of  the  proposed  Volterra  series-based 
nonlinear  compensator  is  presented  in  Fig.l,  where  Hj 


/V  ^ 

and  »  respectively,  denote  the  estimated  linear 


and  nonlinear  kernels  of  a  given  system,  obtained 
adaptively  through  the  system  identification  procedure  by 
utilizing  second-  and  higher-order  statistical  information 
of  the  input  and  between  the  input  and  output.  Also, 

is  a  linear  inverse  derived  from  H^.  In  Fig.2(a),  the 
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structure  of  the  multi-block  (here,  J-block)  compensator 
proposed  in  this  paper  is  presented,  where  the  J-block 
compensator  can  be  implemented  in  a  systematic  way  by 
adding  a  basic  block  G(see  Fig.2(b)),  composed  of 
and  ^  physical  system,  to  the  (J-1)- 

block  compensator.  Note  that  (i)  application  of  the 
proposed  J-block  compensator  leads  to  elimination  of 
system  nonlinearities  up  to  (J+l)-order,  when  the  second- 
and  higher-order  kernels  of  a  given  system  are  not  equal 
to  zero,  and  (ii)  the  structure  of  the  lowest-order 
inverse[2]  coincides  with  that  of  an  one-block  nonlinear 
compensator  without  compensating  for  linear  distortion. 
Note  that  the  lowest-order  inverse  compensator[2]  is 

composed  of  a  pure  nonlinear  filter(  )  followed 

by  a  linear  inverse  filter  (//j  ). 

The  new  compensator  proposed  in  this  paper  is  shown 
in  Fig.  1-2,  where  v  =  is  in  parallel 

with  the  linear  inverse  system  ( //,'  ).  Then,  the 
predistorted  signal  is  y  =  >  nnd  the 

output  of  the  physical  system  to  be  linearized  is  given  by 

x,=hJh;'{x,))-H,{v{x,)) 

(2) 

n=2  ' 

To  meet  the  linearization  condition  x^^  ~  x^y ,  the 
following  should  be  satisfied: 

rt!=2  ' 


For  convenience,  let  x  =  Xj  -  and  define 

the  following: 

H,H;'=I  +  e,  (4) 

=H„+A„  (5) 

where  7  is  an  identity  operator,  and  e^  and  A„  are 
error  terms  due  to  imperfect  kernel  estimation. 


Then,  eq.(3)  can  be  expressed  as  follows: 


x  =  Xo-77i 


V  V«=2 


«=2 

+  X  A„  (h:'  (X))  -  e/x  <■' 


(6) 


Note  that  if  a  weakly  nonlinear  system  is  identified 
exactly(i.e.,  77,  =  H,  and  H„  =  H„ ),  then,  eq.(6)  yields 
the  following: 

(7) 

n=2 


In  addition,  eq.(7)  (i.e.,  finding  x)  can  be  solved  by 
applying  two  methods  discussed  below.  Also,  the  output 
of  the  pre-compensator  can  be  expressed  by 

y  =  W,-‘(x„-.tf,V(x))=W,-'W  (8) 

In  particular,  it  can  be  shown  from  eq.(4)-(6)  that  the 
compensated  output  error  (i.e.  Xo-Xo)  due  to  the 
imperfect  system  identification  can  be  expressed  as 
follows: 

+  e;'ix„) 

Now,  let’s  consider  two  methods  (Method  1  and  Method 
II)  to  solve  eq.(7).  The  difference  between  two  approaches 
is  that  Method  II  uses  the  previously  calculated  solutions 
of  eq.(7)  for  the  current  time  solution  to  be  calculated,  but 
Method  I  dose  not. 

2-1.  Method  I:  conventional  solution 

Letx  be  a  fixed  point  of  the  mapping  T:X  X  in 
eq.(7).  Then,  it  can  be  obtained  by  the  following  iterative 
procedure,  based  on  the  contraction  mapping  theorem 
providing  the  sufficient  condition  for  the  existence  of  the 
fixed  point  [3,5,6]: 

xj = xo  -  X  4(^r‘h-i)) = nxj.,)  (10) 

where  x„  is  the  initial  point  and  the  mapping  T(  )  is 
described  in  Fig.  2(b). 

It  can  be  shown  that  another  linearization  scheme[5],  , 
where  the  linear  inverse  filter  is  located  in  front  of  the 
conpensator  and  the  basic  block  T  is  composed  of  the 
pure  nonlinear  filter  followed  by  the  linear  inverse  filter, 
shows  the  same  performance  as  the  compensator  of  Fig.2. 

2-2.  Method  II:  more  efficient  solution 

Now,  consider  the  problem  of  designing  the  adaptive 
preprocessor  (as  shown  in  Fig.2)  in  an  efficient  way.  It  is 
assume  that  the  linear  part  of  a  physical  system  being 
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linearized  is  invertible.  As  shown  in  Fig.3,  previously 
calculated  data  JCy  [/] ,  I  <k  (k:  current  time),  and  the 
current  output  Xj[k]  can  be  utilized  to  obtain  Xj[k]. 
However,  to  utilize  the  previously  calculated  data  Xj[l], 

each  block  system  G  need  be  expressed  in  a  Volterra 
series  form.  To  avoid  it,  the  following  notations  are 
adopted  with  the  structure  of  G  being  maintained: 

=[xj_i[k],xj[k-l],...,xj[k- +1]]  (11) 

Yj  +  (12) 

where  (i)  Xj  is  the  input  vector  to  the  linear  inverse  filter 
in  the  j-th  block,  (ii)  Yj  is  the  input  vector  to  the  pure 
nonlinear  filter  in  the  j-th  block,  (iii)  is  the  memory 
size  of  the  linear  inverse  filter,  and  (iv)  fj  is  the 
intermediate  value  in  the  j-th  block(see  Fig.  3). 

Now,  we  will  derive  the  sufficient  condition  when  the 
above  approach  yields  a  contraction  mapping  (also,  see 
eq.(13)).  To  derive  such  condition,  it  is  convenient  to  use 
Volterra  series  expression  of  the  basic  block  G  (here,  let 
be  the  -order  Volterra  operator  of  G).  Then,  the 

input  vector  X'  to  the  j-th  block  can  be  given  by 


where  Mq  =  M  ^  +  M  - 1  and  r ,  satisfying  the 
condition  of  eq.(16),  is  the  upper  bound  for  the  applicable 
magnitude  of  the  input  to  each  basic  block  T, 


3.  Computational  Complexity 


To  calculate  the  computational  burden  for  the  Volterra 
series-based  preprocessor  discussed  in  section  2.2, 
examine  the  number  of  multiplications  required  per 
system  output.  As  shown  in  Fig.  2,  each  block  G  requires 
M ^  multiplications  for  the  linear  inverse  filter(LIF)  and 

multiplications  for  the  pure  nonlinear 


part.  Since  each  block  is  an  independent  Volterra  system 
of  same  structure,  the  J-block  compensator  requires  J 
times  the  number  of  multiplications  per  output  required 
for  one-block  compensator(i.e.,  when  Method  I  is  applied). 

As  shown  in  Fig.  3(i.e.,  Method  II),  some  operations, 
executed  exactly  in  the  same  way,  exist  in  every  block  G 
(i.e.,  in  both  the  linear  inverse  and  the  pure  nonlinear 
operations).  Thus,  it  can  be  easily  shown  that  the  number 
of  multiplications  per  output  for  J  linear  inverse 
operations  is  ( M  ^  - 1)  +  /  -H  M  ^  and  the  number  of 

multiplications  per  output  for  J  pure  nonlinear  operations 
is  given  by 


Xj  [*]  =  T{x  [k])  =  Xo[k]-f^  G„  (xj_,  [*])  (13) 

n=2 

X'j  ,Xj[k-Mi^-M  +  2]]  (14) 

For  all  W’lbis?-. 


f 


k  +  M-l 


+  ft] 


n’\'  M  —'1 


+  y£n  (18) 


For  example,  if  7=3  N  =  3,  M  =  4,  A/^  =  10 ,  then 

number  of  multiplications  required  in  Method  II  is  85  and 
that  needed  in  Method  I  is  220. 


i7’(x)-r(>-)||=  I:(G„(a:)-G„().)) 


\\n=2 

N  n  Mo-lMe-l  Wc7-1 

«=2  k=\  /,=()  l2=()  /„=() 

N  n  Mc-lMa-l  Mc~l 

«=2  =0  4=0  /„=() 


8.(h . (15) 


itofOisk 


§ofOisk 


4.  Simulation  Results 

To  verify  the  performance  of  the  proposed  method,  two 
simulation  results,  obtained  by  analyzing  a  baseband 
satellite  communication  channel  model  and  a  sample-and- 
hold  circuit  model,  are  given  in  the  next  two  sections.  In 
particular,  note  that  a  post-compensator,  which  can  be 
also  designed  for  the  linearization  of  the  sample-and-hold 
circuit  has  the  same  structure  as  the  pre-compensator[l]. 


and  the  sufficient  condition  for  7’(-)  to  be  the  contraction 
mapping[6]  is  given  by 


N  n  M.-IM,-}  M^-l 

«  =  III  I -  I 

n=2k=\  /,=0  1,-0  4=0 


g„(ii,...,4) 

#of  Oisk 


<  1 


(16) 


4-1.  Simulation  I 

In  this  simulation,  the  performance  of  the  proposed 
nonlinear  pre-compensator  is  tested  by  utilizing  a 
baseband  model  of  a  satellite  communication  channel  [4], 
as  shown  in  Fig.4:  the  transmit  and  receiver  filters  are 
given  by  TX=[0.8,  0.1]  and  RX=[0.9,  0.2,  0.1], 
respectively,  and  the  nonlinearities  of  TWT(traveling 
wave  tube)  may  be  characterized  by  the  following 
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5.  Conclusion 


AM/AM  and  AM/PM  conversions: 


A{r)  = 

««  r 

(19) 

1  + 

4)(r)  = 

(20) 

1  +  P/' 

In  eq.(19)-(20),  a„  =  2,  =  1,  =  n  /  3 ,  =  1 ,  and 

r  is  the  input  amplitude.  If  0  is  the  phase  of  the  TWT 
input,  then  the  amplitude  and  phase  of  the  TWT  output 
can  be  expressed  by  A(r)  and  4>(r)  +  9.  Then,  the 
satellite  channel  can  be  modeled  by  Volterra  series  with 
odd-order  nonlinearities[3,4].  For  this  simulation,  its 
third-order  Volterra  model,  and  16-PSK  signals  with  0.68 
magnitude  (as  the  input  to  the  predistortion  compensator) 
are  applied.  The  simulation  results  are  given  in  Fig.  5(b) 
(i.e.,  the  compensated  output  when  Method  II  is  applied) 
and  in  Table  I  (i.e.,  NMSE  vs.  #  of  basic  blocks),  where 
the  efficiency  of  Method  II  is  indicated,  compared  with 
that  of  Method  I. 

4-2.  Simulation  II 

The  nonlinearity  in  a  sample-and-hold  circuit  may  be 
caused  by  the  input-independent  and  input-dependent 
timing  jitters,  and  the  relation  between  the  input  x[k]  and 
output  f(x[k])  can  be  approximated  as  follows[4]: 

f{x[k])^4k]+Y{^[k]-x[k-\]){l-\4k1i}  (21) 

where  c  is  a  constant  determined  by  the  circuit  parameters 
and  T  is  a  sampling  interval.  In  this  simulation,  c=0.02 
and  the  input  is  a  random  signal  uniformly  distributed 
between  -3.0  and  3.0.  The  simulation  results  are  shown  in 
Fig.  6,  with  the  convergent  property  of  the  NMSE  in  the 
compensated  output  when  Method  II  is  applied  Note  that 
the  two-block  compensator  shows  almost  the  same 
performance  as  the  three-block  one  does,  where  little 
performance  improvement  for  the  three-block  case  is 
observed  (this  is  closely  related  to  modeling  error,  as 
mentioned  in  eq.(9),  due  to  the  third-order  Volterra 
system  identification  of  the  sample-and-hold  circuit). 

For  another  test,  a  Volterra  model  of  the  sample-and- 
hold  circuit  (not  using  eq.(21))  with  10  MHz  sampling 
frequency  and  0.98  MHz  mono-tone  input  signal  [4]  is 
utilized.  In  this  case,  the  compensator  can  be  designed  by 
using  the  estimated  kernels  (i.e.,  by  finding  the  inverse  of 
the  Volterra  filter:  thus,  the  error  of  eq.(9)  does  not  exist 
in  this  case).  The  simulation  results  are  plotted  in  Fig.  7, 
where  it  is  shown  that  if  more  basic  blocks  are  added  to 
the  system,  higher-order  nonlinear  terms  (e.g.,  3-th  ,  5-th 
order,  etc.)  can  be  compensated  for  in  a  systematic  way. 


In  this  paper,  a  systematic,  but  simple,  design 
procedure  for  a  Volterra  series-based  adaptive 
preprocessor  (or  predistortor)  is  presented  to  linearize 
weakly  nonlinear  systems.  In  particular,  it  is  shown  that 
system  nonlinearities  up  to  a  required  order  can  be 
compensated  for  in  a  systematic  way  by  adding 
appropriate  basic  compensation  blocks  to  the 
corresponding  system,  and  the  computational  burden  for 
such  linearization  can  be  considerably  reduced  by 
utilizing  the  fact  that  each  block  in  the  compensator  has 
the  same  structure.  Finally,  the  performance  of  the 
proposed  approach  is  verified  by  utilizing  a  baseband 
satellite  communication  channel  model  and  a  sample-and- 
hold  circuit  model. 
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Fig.  1.  The  proposed  predistortion  compensator 
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(b)  block  T 


Fig.  2  The  J-block  compensator  (  Method  I):  (a)  its 
structure:  (b)  contraction  mapping  block  T  used  in  (a) 
(The  initial  value  of  Xj  is  Xq  ). 


Table  I.  NMSE  vs.  #  of  blocks  for  the  satellite 
communication  channel. _ 


#  of  blocks  ( J) 

Method  I 

Method  II 

1 -block 

0.00659 

0.00595 

2-block 

0.00183 

0.00153 

3-block 

0.00057 

0.00050 

4-block 

0.00030 

0.00028 

5-block 

0.00015 

0.00015 

Fig.  6.  The  NMSE  in  the  compensated  output  of  the 
sample-and-hold  model:  (a)  1 -block,  (b)  2-block,  and 
(c)  3-block. 


Fig.  3.  The  basic  block  T  (Method  II:  here,  the  initial 
value  of  Xj  Is  Xq  ). 


Fig.  4.  The  baseband  model  satellite  communication 
channel 


(a) 


(b) 


Fig.  7.  Spectral  distribution  of  the  compensated 
output  of  the  sample-and-hold  circuit  with  a  mono- 
tone(0.98MHz)  input  signal:  dotted  line:  1 -block; 
dashed  line:  3-block;  solid  line:  uncompensated. 


Fig.  5.  The  compensated  output  of  a  satellite 
communication  channel  when  Method  II  is  used:  (a) 
1  -block,  (b)  5-bolck 
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Abstract 

This  paper  forms  a  part  of  a  series  of  recent  studies  we 
have  undertaken,  where  the  problem  of  nonlinear  signal 
modelling  is  examined.  We  assume  that  the  observed 
output''  signal  is  derived  from  a  Volterra  filter  that  is 
driven  by  a  Gaussian  input.  Both  the  filter  parameters  and 
the  input  signal  are  unknown  and  therefore  the  problem 
can  be  classified  as  blind  or  unsupervised  in  nature.  In  the 
statistical  approach  to  the  solution  of  the  above  problem 
we  seek  for  equations  that  relate  the  unknown  parameters 
of  the  Volterra  model  with  the  statistical  parameters  of  the 
output"  signal  to  be  modelled.  These  equations  are 
highly  nonlinear  and  their  solution  is  achieved  through  a 
novel  constrained  optimisation  formulation.  The  results  of 
the  entire  modelling  scheme  are  compared  with  recent 
contributions. 


1.  Introduction 

Nonlinear  filters  constitute  an  important  part  of  statistical 
signal  processing  for  non  gaussian  processes  since  the  use 
of  linear  filters  to  extract  signals  of  interest  yield 
suboptimal  solutions  in  the  presence  of  such  processes. 
Considerable  attention  has  been  focussed  in  published 
papers  (for  example  see  [1-5]),  on  the  subject  of  nonlinear 
signal  modelling  by  means  of  Volterra  representations. 

In  the  blind  form  of  the  problem,  only  the  output  is 
observable,  and  thus  for  any  subsequent  modelling  we 
must  use  only  the  measured  output  data.  A  fundamental 
assumption  underlying  many  of  the  approaches  to  the 
problem  it  to  take  the  "input"  to  be  a  stationary  random 
process  with  Gaussian  statistics,  an  assumption  which 
allows  substantial  simplification  in  the  closed-form 
expressions  for  the  Volterra  kernels.  It  has  been  shown 
that,  for  a  zero  mean  white  Gaussian  input,  nonlinear 
expressions  for  the  linear  and  quadratic  transfer  functions 
can  be  derived  in  terms  of  various  spectral  moments  up  to 
third  order  (i.e.  the  bispectrum)  of  the  signal  [4-5]. 


The  outstanding  difficulty  in  such  modelling  is  related  to 
the  quest  for  finding  solutions  for  these  highly  nonlinear 
equations  that  yield  the  Volterra  kernels. 

In  this  contribution  we  employ  a  quadratic  Volterra  filter 
form  for  the  modelling  of  one  dimensional  signals. 

In  our  recent  studies  [4-5]  we  used  the  nonlinear  equations 
that  yield  the  Volterra  parameters  to  form  an  unconstrained 
optimisation  problem  which  was  solved  using  Lagrange 
Programming  Neural  Networks  (LPNN)  [6]. 

In  [8-9]  another  approach  is  proposed  for  the 
determination  of  the  Volterra  kernels  which  is  based  on 
constrained  optimisation  again  using  LPNN.  In  [8-9]  the 
contribution  is  that  we  pay  particular  attention  to  the 
reliability  of  the  statistical  measures  used  in  the  process. 
Indeed,  the  second  order  moments  are  known  to  be  more 
reliable  than  higher  moments.  Use  of  this  fact  is  made  in 
the  construction  of  the  constraints  and  penalty  functions 
for  the  optimisation  problem. 

In  this  paper  we  pay  again  particular  attention  to  the 
second  order  statistical  measures  but  in  a  different  sense. 
More  specifically  we  use  the  equations  that  relate  the 
unknown  parameters  of  the  model  with  the 
autocorrelations  of  the  signal  to  form  a  multivariate 
penalty  function.  This  function  is  incorporated  in  the  cost 
function  and  yields  a  so  called  transformation  minimisation 
function.  The  method  presented  in  this  study  belongs  to  a 
class  of  optimisation  methods  called  penalty- 
transformation  methods  [10].  These  seem  to  provide  better 
results  to  the  Volterra  parameter  estimation  problem  than 
the  Lagrange  methods,  as  far  as  robustness  and  rate  of 
convergence  are  concerned. 

2.  Preliminaries 

We  represent  the  signal  as  the  output  of  a  non  linear  time 
invariant  causal  system  driven  by  noise  x[n] .  The  Volterra 
representation  of  the  input/output  relationship  is  given  by 
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)'[«]  =  h„  +  '^h^[i]x[n  - 1\  7  Wn  -  iW«  -  H 

i  i  j 

7,  k]x[n  -  i]x[n  -  j]x[n  -  k]+. . . 

i  j  k 

where  h^U],  h2[ij],  h^[ij,k],.,.  are  the  linear,  quadratic, 
cubic  etc.  filter  weights  or  kernels  respectively,  is  a 
constant  term  whose  value  depends  on  the  input  x[n]  and 
0<iJ,k<N -I  where  N  denotes  the  filter  length  [4-5]. 
As  a  linear  time-invariant  (LTI)  system  is  completely 
characterised  by  its  unit  impulse  response,  so  a  nonlinear 
system  which  can  be  represented  by  a  Volterra  series  is 
completely  characterised  by  its  Volterra  kernels. 
Identification  of  systems  which  contain  anything  higher 
than  second-order  kernels  is  a  very  difficult  task  because 
of  the  excessive  amount  of  computation.  In  this  paper  we 
consider  second  order,  causal,  Volterra  filters  with 
symmetric  kernels.  A  symmetric  kernel  is  a  symmetric 
function  of  its  arguments  so  that  for  n  arguments 
there  are  n\  possible  interchanges  that  leave 

unchanged.  Specifically,  for 
n  =  2,  is  symmetric  if 

The  second  order  Volterra  filter  is  given  by  the  following 
relationship: 

yM  =  '^a[i]x[n  -  /]  j]x[^  "  "*  j] 

i  i  j 

where  we  assumed  that  h^=0  without  loss  of  generality 
[5].  By  assuming  that  the  input  signal  x[n]  is  a  discrete, 
stationary,  zero  mean,  white  Gaussian  process,  the  output 
process  is  also  discrete,  stationary  but  non-Gaussian 
process  and  not  necessarily  zero  mean.  To  simplify  the 
expressions  we  assume  that  the  output  is  zero  mean  in 
which  case  the  following  condition  ^b[ij]  =  0  must  hold. 

i 

3.  Second  order  statistical  analysis 

In  this  section  the  second  order  statistical  analysis  of  the 
output  will  be  considered.  The  autocorrelation  function  of 
the  real  process  y[n]  is  given  by  R[k]=  E{y[n]y[n-\- k]} 

and  in  view  of  the  second  order  Volterra  model  can  be 
written  as: 

R[k]  =  ^a\i]E{x[n  -  i\y[n  +  it]} 

i 

+  fiE{x[n  -  i]x[n  -  j]y[n  +  fc]}  (1) 

*  J 

The  terms  of  equation  (1)  involve  averaging  over  the 
product  of  one,  two,  three  and  four  Gaussian  random 
variables.  It  is  known  that  the  average  of  the  product  of  an 


odd  number  of  zero-mean  jointly  Gaussian  random 
variables  is  identically  zero  irrespective  of  their  mutual 
correlation.  Moreover,  the  average  of  the  product  of  an 
even  number  of  zero-mean  jointly  Gaussian  random 
variables  is  equal  to  the  summation  over  all  distinct  ways 
of  partitioning  the  random  variables  into  products  of 
averages  of  pairs.  For  example,  if  Xi,X2,X2,x^  are  zero- 
mean  jointly  Gaussian  random  variables,  then: 
E{xiX2X'^}-0  (2) 

E{x^X2X^x^}  = 

E{XiX2  }E{X2^X^  }  +  E{X^X^  }E{X2X^  }  +  E{XiX^  }E{X2X7,  } 

(3) 

With  (2)  and  (3),  R[k]  reduces  to  the  form: 

R[k]  =  0^a[i]a[i  +  k]  +2fi'^^^b[iJ]b[i  +  k,j  +  k]  (4) 

i  i  j 

where  is  the  variance  of  the  input  driving  noise. 

The  autocorrelation  function  given  by  (4)  is  not  sufficient 
to  solve  the  problem  because  the  number  of  unknowns 
present  in  these  equations  is  much  greater  than  the  number 
of  useful  samples  of  /?[^].  For  example  for  the  one 
dimensional  case  if  the  kernels  a[i\  and  b[Uj]  have 
length  N ,  then  the  number  of  equations  provided  is  N , 
while  the  number  of  unknowns  is  V(V  +  3)/2  +  l. 
However,  additional  information  can  be  provided  by 
examining  higher  order  statistics  [5]. 

4.  Third  order  statistical  analysis 

In  this  section  additional  information  is  by  examining  the 
higher  order  statistics  of  the  output,  and  this  information 
can  be  employed  towards  the  solution  of  the  nonlinear 
equations.  If  we  define  M[A:,/]  to  be  the  third  order 
moment  sequence  of  y[n\ ,  then  [5]: 

M[k,  1]  =  E{y[n]y[n  +  +  /]}  (5) 

In  the  following,  the  third-order  moment  sequence  of  the 
second  order  Volterra  filter  is  derived.  First  we  use  the 
following  symbols: 

Gi[k]  =  '^a[i]x[n  -i  +  k]  (6) 

i 

Gat*]  =  S JW”  -  i  +  -  J  +  (7) 

I  j 

Based  on  (6),  (7)  one  can  easily  expand  (5)  in  the 
following  compact  form: 

M[k,l]=  E{G,[0]G,[^]G,[1]  +Gi[0]Gi[A]G2[1] 
+Gi[0]G2[*]Gi[/]  +G,[0]G2[it]G2[/]  +G2[0]G,[*]G,[/] 
+G2[0]G,[A:]G2[/]  +G2[0]G2[*]G,[/]  +G2[0]G2[k]G2[l]} 

(8) 


133 


The  first,  fourth,  sixth  and  seventh  terms  of  (8)  involve 
averaging  over  an  odd  number  of  zero-mean  jointly 
Gaussian  random  variables.  Therefore  are  identically  zero. 
Equation  (8)  then  becomes: 

M[k,l]=  E{G^[0]G^[k]G2{l]  +Gi[0]G2[/:]G,[/] 
+G2[0]G,[A:]G,[Z]  +G2[0]G2[A:]G2[/]}  (9) 

Each  term  of  (9)  involves  averaging  over  an  even  number 
of  zero-mean  jointly  Gaussian  random  variables.  Keeping 
in  mind  the  procedure  we  described  in  the  previous 
paragraph,  one  can  decompose  the  average  of  the  product 
of  an  even  number  of  jointly  Gaussian  random  variables 
into  a  summation  of  products  of  averages  of  pairs.  The  first 
term  of  (9)  (not  using  the  fact  that  b[i,  7]  is  a  symmetric 

kernel),  can  then  be  written  as  follows; 

£{G,  [0]G,  [*]G2  [/]}  =  £{(I  amx[n  -  i]) 

i 

-i  +  k])  j]x[n - i + l]An -j  +  l])} 

i  i  j 

=  +  l]a[j  + 1  -  k]b[ij] 

i  J 

Now  we  define  ^i[k,l]  to  be  as  follows: 

(pi[U]=  I,I,a[i]a[j]b[i  +  kJ^l] 
i  j 

It  can  be  proven  [8]  that 

E{Gi  [0]G,  [Jt]G2  [/]}  =  2/?V,  [1,1 -k]  (10) 

Similarly,  one  can  show  that: 

E{GilO]G2lk]G,[l]}=2f^i[k,k-l]  (11) 

£{G2[0]Gi[*]Gi[/]}=  20^U-k-l]  (12) 

The  fourth  term  of  (9)  is  quite  different  from  the  first  three 
terms.  It  involves  averaging  over  the  product  of  four 
Gaussian  random  variables  as  well  as  averaging  over  the 
product  of  six  Gaussian  random  variables.  The  latter  can 
be  broken  into  the  sum  of  fifteen  terms,  where  each  term 
involves  a  product  of  three  averages  of  distinct  pairs  of 
random  variables. 

By  doing  so  and  defining: 

<h.lx,y,z]  =  =  Y.Y.T.b[i,j]b[i  +  x,k+ y]b[j  +  z,k] 

i  j  k 

we  obtain: 

£{G2[0]G2[A:]G2[(]}  =  iP%[l,l-  k,k]  (13) 

which  is  valid  only  when  '^b[i,i\  =  0  or  b[i,i\-0,  Vi ,  In 

i 

Other  case  equation  (13)  is  more  complicated. 

We  replace  (10),  (11),  (12),  (13)  in  (9)  and  we  obtain 
M[kJ] .  It  is  now  possible  to  use  (4)  in  conjunction  with 
(9)  to  provide  a  sufficient  number  of  nonlinear  equations 


required  to  solve  for  the  unknown  system  parameters 
a  and  b  and  the  variance  of  the  driving  noise  fi , 


5.  Lagrange  Programming  Neural  Networks 
(LPNN) 


LPNN  is  a  neural  network  primarily  designed  for  general 
nonlinear  programming.  The  methodology  is  based  on  the 
well-known  Lagrange  multiplier  method  for  general 
constrained  optimisation  problems.  The  essential 
components  of  the  approach  are  as  follows  [6-7]. 

Consider  the  following  constrained  nonlinear  programming 
problem: 

Minimise  X(f) 

Subject  to  Y(f)  =  0 


where  X:R^ -¥  R  and  Y:R^ R^  are  given  functions 
and  m<n.  The  components  of  Y  are  denoted 

We  can  define  the  Lagrangian  function  L:R^^^  — >  jR  by 

L(f,x)=x(f)+mf) 


where  A  e  is  referred  to  as  the  Lagrangian  multiplier. 
The  dynamic  equations  of  the  LPNN  are  then  defined  as: 

^  =  -VfL(f,A)  (14) 

dt  ‘ 

^  =  V;iL(/,l)  (15) 

at 

where 


V/L(/,A) 

V;iL(/,A) 


dL  dL 

dL 

df,  'df2 

"dfn. 

'  dL  dL 

dL 

dX^  'dX2  ’ 

'"'dX^ 

and 


t 

,  respectively. 


If  the  network  is  physically  stable,  then  the  equilibrium 

point  (/*,A*)  defined  by  —  =  0  and  —  =  0,  satisfies 

dt  dt 


the  equations: 

V^L(/*,A*)  =  VX(/*)  +  Vy(/*)U*  =0 


V;tL(/*,r)-y(/*)=o 

and  thus  provides  a  Lagrange  solution  to  the  optimisation 
problem. 

Equations  (4)  furnish  a  set  of  relationships  for  the 
autocorrelations  while  (9)  provide  the  third  order  moments 
and  both  of  those  quantities  can  be  estimated  by  standard 
means  from  a  given  signal.  We  are  seeking  for  the 
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parameters  {a[i]] ,  {b[ij]}  and  yS  of  a  Volterra  model 

that  would  produce  these  second  and  third  order  moments. 
The  problem  is  therefore  as  follows:  Given  the 
autocorrelation  estimates  p[k]  and  the  third  order 

moments  estimates  ju[kj]  obtained  from  the  signal 
directly,  to  determine  {a[/]},  {b[ij]]  and  yfi. 

5.1  Unconstrained  optimisation 

In  [4-5]  we  present  a  solution  to  the  above  problem  in  the 
LPNN  sense,  which  starts  initially  with  a  suitable  cost 
function  such  as  the  one  given  below 

L(f)  =  X(M»]  -  +  IKM*-;]  - 

i  i  J 

where  /  ={a.b,pf  is  a  vector  formed  by  the  unknown 
parameters  of  the  Volterra  model  and  the  unknown 
variance  of  the  driving  noise  and  R[i,f],  M[iJ,f]  are 
the  theoretical  expressions  for  the  autocorrelations  and 
third  order  moments  respectively. 

From  this  cost  function  and  in  accordance  with  the  above 
formulations,  the  LPNN  dynamic  equations  may  be  set  up 
as  in  (14).  Notice  that  no  constraints  have  been 
incorporated.  Thus,  the  Lagrange  parameters  are  set  to 
zero,  or  the  corresponding  Lagrange  neurons  are  clamped 
to  zero  level. 

The  signal  flow  graph  of  these  equations  (14)  describes  the 
required  dynamic  neural  network  structure  the  steady  state 
of  which  delivers  the  solution. 

5.2  Constrained  optimisation 

In  [8]  the  minimisation  is  carried  out  under  certain 
constraints  which  we  have  chosen  in  a  way  to  reflect  the 
accuracy  of  our  measurements  and  of  the  estimation 
procedures.  For  finite  duration  signals  autocorrelations  are 
more  accurately  estimated  than  higher  order  moments,  so  a 
constrained  nonlinear  programming  problem  is  formulated 
as  follows. 

Minimise:  L(f)  = 

<  j 

subject  to:  /J[i]  =  /J[i ,  /  ] 

In  this  form  we  have  a  constrained  optimisation  problem 
for  which  we  form  the  following  Lagrange  function 
L{f,X)  =  Uf)  + 1  A,.  (p[/]  -  /?[«,/]) 

i 

The  dynamic  equations  of  the  LPNN  (which  are  the 
updated  equations  for  /and  A )  are  now  defined  as  in  (14) 
and  (15). 


The  stability  of  the  neural  network  and  the  optimality  of 
the  solution  are  guaranteed  under  some  regularity  and 
convexity  conditions  [6-7]. 

5.3  Penalty  methods 

In  [9]  we  pay  again  particular  attention  to  the  second  order 
statistical  measures  but  in  a  different  sense.  More 
specifically  we  use  the  equations  that  relate  the  unknown 
parameters  of  the  model  with  the  autocorrelations  of  the 
signal  to  form  a  penalty  function  [7],  This  function  is 
incorporated  to  the  cost  function  and  yields  a  so  called 
Augmented  Lagrangian  function.  The  problem  now  has  as 
follows. 

L(/,A) = L(/) 

i  i 

where  [c,. }  is  a  penalty  parameter  sequence  satisfying 
0<Ck  <c*+,  Vfc,  q  -^oo 

The  development  of  the  above  method  was  motivated  by 
the  concept  of  maintaining  implicit  control  over  the 
violations  of  constraints  by  penalising  the  objective 
function  at  points  which  violate  or  perhaps  tend  to  violate 
the  constraints. 

The  methods  presented  in  [8-9]  provide  solutions  closer  to 
the  optimum  than  the  unconstrained  method  in  [4-5]. 

5.4  Penalty-transformation  methods 

In  this  study  the  problem  is  again  formulated  as  in  [8-9]. 
Minimise: 

i  j 

subject  to: 

Yi(f)  =  p[i]-R[ij]  =  0,  /  =  0,...,V-1 
We  use  now  a  new  penalty  function  [9]  defined  by 

N-\ 

P(y(/),0,cr)=  +  where  6  and  o  are 

/=o 

N  -  vectors  of  controlling  parameters.  N  is  the  length  of 
the  Volterra  model, 

A  transformation  function  now  becomes 
nfAc) = L(/)+ 1^,(0, ■ + Yi{f)f 

This  method  is  based  on  the  observation  that  if 
minimizes  7(/,6,o),  then  is  also  a  solution  of  the 

problem:  Minimise  L(/^*^)  subject  to  K(/^^^). 

The  algorithm  we  use  has  as  follows. 

(i)  We  select  =  0,  =  1,  =  0 
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(ii)  k-k  +  \ 

(iii)  Minimise  T(f  to  find  starting  the 

unconstrained  minimisation  from 

(iv)  If  is  converging  sufficiently  rapidly  to  zero, 

then 

where  0  <  c  <  1  and  return  to  (ii). 

Otherwise 

0(k+\) 

where  fi>l  and  return  to  (ii). 

In  this  method  the  penalty  parameters  that  are  incorporated 
into  the  cost  function  are  changing  according  to  the  change 
of  the  cost  function  which  is  under  explicit  control.  By 
doing  this  we  try  to  avoid  the  possibility  that  the  cost 
function  will  diverge  because  of  an  independent  increase 
of  the  penalty  parameters. 

6.  Simulations 

In  the  simulations  we  use  a  synthetic  one  dimensional 
signal  which  is  described  by  a  quadratic  of  the  following 
form: 

y[n\  =  x[n\  +  - 1]  +  05x[n\x[n  - 1] 

The  size  of  the  signal  is  ICXIO  samples.  We  apply  both  the 
constrained  optimisation  approach  introduced  in  [8]  and 
the  penalty-transformation  approach  presented  in  this 
paper  for  this  signal.  We  are  dealing  with  nonlinear  and 
nonconvex  functions  so  we  might  encounter  difficulties 
concerning  the  convergence  of  the  algorithm.  In  this  work 
we  repeat  the  same  experiments  starting  from  different 
initial  points  for  the  unknown  parameters,  chosen  to  be  in 
the  attraction  basin  of  the  global  solution  that  can  be 
approximately  determined  using  simulated  annealing 
algorithms.  The  mean  values  and  the  variances  of  the 
estimates  are  calculated.  The  first  table  below  shows  the 
solutions  arising  from  the  method  which  was  presented  in 

[8].  The  second  table  contains  the  results  for  the  same 
experiments  but  using  the  procedure  described  in  this 
study. 

It  is  observed  from  the  tables  below  that  the  new  approach 
yields  improved  results  as  expected  also  theoretically 
compared  to  the  method  in  [8]  and  to  the  extent  in  [4-5]. 
These  results  arise  from  a  large  number  of  tests  involving 
different  signals. 


Parameter 

real 

value 

mean  estimated 
value 

variance 

«[0] 

1 

1.1652 

0.0014 

am 

1.8 

2.05 

0.0021 

b[0,l] 

0.5 

0.3744 

5.8518e-04 

1.55 

1.5834 

0.0031 

Parameter 

real 

value 

mean  estimated 
value 

variance 

fl[0] 

1 

0.92 

1.459e-04 

am 

1.8 

2.056 

0.001 

*[0,1] 

0.5 

0.47 

1.92e-04 

1.55 

1.5258 

0.0021 
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Abstract 

In  this  paper,  we  propose  an  iterative  mixed  norm  im¬ 
age  restoration  algorithm.  Afunctional  which  combines  the 
least  mean  squares  (LMS)  and  the  least  mean  fourth  (LMF) 
functionals  is  proposed,  A  function  of  the  kurtosis  is  used  to 
determine  the  relative  importance  between  the  LMS  and  the 
LMF  functionals.  An  iterative  algorithm  is  utilized  for  ob¬ 
taining  a  solution  and  its  convergence  is  analyzed.  Exper¬ 
imental  results  demonstrate  the  capability  of  the  proposed 
approach. 


1.  Introduction 

The  image  restoration  problem  has  been  extensively 
studied  in  the  past  years  [1, 5],  In  general,  the  mean  squared 
error  (MSE)  norm  has  been  used  for  formulating  the  restora¬ 
tion  problem,  resulting  in  the  least  mean  squared  (LMS)  ap¬ 
proach.  The  reason  for  this  is  that  it  leads  to  mathematically 
tractable  solutions  and  yields  optimal  results  when  the  con¬ 
taminating  noise  has  Gaussian  distribution. 

In  a  number  of  application,  the  contaminating  noise  may 
be  non-Gaussian  or  a  combination  of  several  noise  types 
[4].  In  such  cases,  norms  with  order  higher  than  two  have 
been  used  in  adaptive  filtering  [2,  6,  7].  It  has  been  shown 
that  the  LMF  approach  outperforms  the  LMS  under  certain 
noise  distributions,  such  as  sub-Gaussian  distributions  [7]. 

In  this  paper,  we  propose  a  mixed  norm  image  restora¬ 
tion  algorithm  to  combine  the  advantages  of  the  LMS  and 
the  LMF  in  dealing  with  several  noise  distributions.  In  or¬ 
der  to  control  the  relative  contribution  of  the  LMS  and  the 
LMF,  a  functional  of  the  kurtosis  is  used. 

This  paper  is  organized  as  follows.  In  section  2  the 
mixed  norm  algorithm  is  proposed.  In  section  3  an  itera¬ 
tive  algorithm  is  described  and  its  analysis  is  presented.  In 
section  4  various  forms  of  the  functional  of  the  kurtosis  are 
discussed.  Finally,  experimental  results  are  shown  in  sec¬ 
tion  5  and  conclusions  are  discussed  in  section  6. 


2.  Mixed  norm  algorithm 

An  MxN  dimensional  image  may  be  degraded  due  to 
uniform  motion,  defocusing,  long  term  atmosphere  turbu¬ 
lence,  and  a  combination  of  them  [1].  The  typical  degrada¬ 
tion  model  has  the  form 

y  =  Hx  H-  n,  (1) 

where  vectors  y,  x,  n  are  of  size  MNxl,  and  represent 
the  lexicographically  ordered  observed  degraded  image,  the 
original  image,  and  the  additive  noise,  respectively.  H  is  the 
degradation  matrix  of  size  MNxMN  which  may  represent 
a  spatially  invariant  or  spatially  varying  degradation. 

For  most  applications  the  noise  term  n  is  assumed  to  be 
zero-mean  Gaussian.  There  are  application,  however,  for 
which  the  additive  noise  is  characterized  by  other  distribu¬ 
tion,  such  as,  uniform  or  Cauchy,  or  by  a  combination  of 
them.  In  the  Gaussian  noise  case  an  LMS  approach  is  used. 
For  other  noise  distribution,  however,  norms  of  higher  order 
need  to  be  used. 

The  work  we  present  in  this  paper  is  motivated  by  the 
use  of  mixed  norm  in  adaptive  filtering  [6,  7].  We  therefore 
propose  to  minimize  the  following  functional  M  (a;)  with 
respect  to  x,  for  obtaining  an  estimate  of  the  restored  image 

M(x)  =  {l-'y(x))\\y-Hx\\l+'y{x)\\y-Hx\\l,  (2) 

where  7(-)  (0  <  7(2:)  <  1)  is  a  parameter  controlling  the 
relative  importance  of  two  terms  and  ||  •  ||p  denotes  the  Ip 
norm. 

As  is  the  case  in  most  image  restoration  application  we 
assume  that  only  one  realization  of  the  degraded  image  y  is 
available.  Then  if  we  assume  that  n  is  ergodic,  \  \y  — 
(divided  by  the  number  of  pixels  in  the  image)  provides  an 
estimate  of  E[{y  —  Hx)'f],  where  £?[•]  denotes  statistical 
expectation  and  {z)i  is  the  ith  element  of  a  vector 

The  performance  of  the  LMS  and  LMF  algorithms  have 
been  investigated  in  the  literature  [2,  4,  7],  It  was  shown 
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that  under  certain  noise  conditions  (e.g.,  sub-Gaussian  sig¬ 
nals),  LMF  and  other  higher  criteria  exhibit  improved  per¬ 
formance  compared  to  the  LMS.  The  converse  is  true  for 
Gaussian  and  super-Gaussian  signals.  By  using  Eq.  (2) 
therefore,  it  is  desired  that  in  the  extreme  cases  of  only 
Gaussian  or  super-Gaussian  noise  the  contribution  of  the 
fourth  norm  is  negligible  (i.e,  'f{x)  »  0),  while  for  sub- 
Gaussian  noise  the  relative  contribution  of  the  fourth  norm 
is  large  (i.e,  7  «  1).  On  the  other  hand,  for  cases  of  mixed 
noise,  7(x)  will  be  attain  values  which  will  control  the  rel¬ 
ative  contribution  of  the  two  norms.  For  all  cases  and  for 
most  practical  situations,  however,  the  noise  distribution  is 
not  known.  Therefore,  it  is  desirable  that  the  values  of  7(1) 
be  determined  using  the  available  data. 

A  parameter  which  determines  the  Gaussianity  of  a  sig¬ 
nal  is  the  kurtosis.  For  a  random  variable  r  it  is  defined  by 


xir)  =  E[r*]-iE^[r^].  (3) 


The  kurtosis  is  zero  for  Gaussian  signal,  it  is  positive  for 
super-Gaussian  or  leptokurtic  signals  and  negative  for  sub- 
Gaussian  or  platykurtic  signals.  In  analogy  to  Eq.  (3),  we 
define  the  kurtosis  of  a  signal  when  only  one  realization  is 
available  by 


|2\2 


(4) 


Based  on  the  above  discussion  we  propose  to  determine 
7(0:)  as  a  function  of  x(^)  *  ^  number  of  forms  of  this  func¬ 
tion  have  been  investigated,  and  are  discussed  in  section  4. 


3.  Iterative  solution  and  convergence  analysis 


3.1  Development 


We  propose  to  use  a  steepest-descent  algorithm  for  ob¬ 
taining  a  solution  to  the  minimization  problem  of  Eq.  (2). 
The  gradient  of  M  {x)  with  respect  to  x  is  equal  to 

V*M(a;)  =  -2(1  -  jix))H^{y  -  Hx)  (5) 
-4:^(x)H'^{y  -  Hxf  +  [-V,7(2:)IIj/  - 
+V.l{x)\\y-Hx\\t^. 

A  closer  look  at  the  last  term  inside  the  brackets  reveals 
that  it  are  very  small,  and  therefore  it  is  omitted  in  the  fol¬ 
lowing  analysis.  It  is  also  confirmed  experimentally  that 
the  restoration  results  are  indistinguishable  with  and  with¬ 
out  the  use  of  the  last  term.  The  resulting  iteration  then 
takes  the  form 

Xjfe+i  =  x*  -t-  /3[(1  —  'y{xk))E^ {y  —  Hxk)  (6) 

+2'y{xk)H'^{y  -  Hxkf]  = 

Xk  +  -  'r{xk))I  +  2j{xk)P{xk)]{y  -  Hxk), 


where  jS  is  the  relaxation  parameter  which  controls  the 
convergence,  as  well  as,  the  convergence  rate  and  P(x) 
is  an  MNxMN  diagonal  matrix  with  diagonal  elements 
P{Xk)i,i  —  {yi  —  {HXk)i)  ■ 


3.2  Convergence  analysis 


In  this  analysis  we  follow  steps  similar  to  the  ones  in  [3]. 
Iteration  (6)  for  two  consecutive  values  can  be  rewritten  as 

Xjfc+l  -Xk  =  {Xk  -  Xk-l)  0) 

+j3[-H'^Hixk  -  Xk-i)  -  H'^{F{xk)  -  F{xk-i)) 
+H'^H{G{xk)-G{xk-{)) 
+2H'^{K(xk)-K{xk-{)) 
-2H'^{L{xk)-L{xk-i)\, 

where  F{xk)  =  'l{xk)y^  G(xk)  =  7(^k)xk>  F(xk)  = 
7(xk)P(xk)y,  and  L(xk)  =  7(x*)P(x*)ifx*.  The  non¬ 
linear  factors  F(xk),  G{xk),K{xk),  and  L{xk)  can  be  lin- 
earized,  that  is, 

F{^k)  “  ^  JF{^k){^k 

G{Xk)  G{Xk-l)  ^  Jg (^fc ) 

K{Xk)  -  K{Xk-x)  «  JKiXk^Xk-Xk-x), 


and 


L{xk) -- Li{Xk~l)  ^  JL{Xk){Xk 

where  Joi^k)^  Jii^k)  the 

Jacobian  matrices.  Their  mnth  elements  denoted  by 


JF(Xk)m,n9  JG(Xk)m,nf  JK(Xk)m,n  JL{Xk)m^n 

and 

are  equal  to 

JFiXk)m,n 

«  0, 

(8) 

JG{Xk)m,n 

«  7(®fc)<^m,n. 

JK{Xk)m,n 

“  a*.,„ 

and 

JL{Xk)m,n  ^ 

!  ^dP{Xk)m,m  u 

(9) 

+7  fc  ) A; )  m  ,m  ,  n  j 

where  Sm,n  =  1  when  m  =  n.  From  Eqs.  (8)  and  (9), 

JK{Xk)m,n  JL{Xk)m,n  “ 

j{Xk)[~’^{ym  ~  hm,iXk,i)hm,nym 
i 

+2(2/771  —  ^  ^  hm,i^k,i)f^myn  V]  hm,iXk4 
i  i 

“”(2/m  ^  ^  hm.jXk.i)  hfriyn] 

i 

=  “3y(Xk)P(Xk)m,in^m,n> 
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Therefore,  Eq.  (7)  can  be  rewritten  as 


5.  Experimental  results 


iC/j  —  (11) 

[/  -  /3((1  -  'rixk))H'^H  +  6j{xk)H^P{xk)H)]  ■ 
{Xk-Xk-l). 

Considering  the  norm  of  both  sides  of  Eq.  (11),  the  suffi¬ 
cient  condition  for  convergence  becomes 

11/  -  ^((1  -  j{xk))H^H  +  6^{xk)H^P{xk)H)\\  <  1. 

(12) 

Since  (1  —  ^{xk))H^H  -f  6^{xk)H'^ P{xk)H  is  a  positive 
definite  matrix,  this  condition  becomes 

|l-i0(Arnax((l-7(^ife))i?^^-f-67(xjfe)ir^P(a:fc)i7))|  <  1, 

(13) 

where  Xmax  (-4)  denotes  the  maximum  singular  value  of  the 
matrix  ^4.  Condition  (13)  from  a  sufficient  condition  for 
convergence  which  needs  to  be  satisfied  at  each  iteration 
step.  Using  the  triangular  inequality  this  condition  becomes 

2 

(1  _  r^[xk))  +  &'^{Xk)\max{P{Xk))  ’ 

because  XmaxiH'^H)  =  Xmax{H^)  =  Xmax{H)  =  1. 
Since  0  <  7(xfc)  <  1  and  Xmax(P{xk))  is  greater  than  one 
and  even  tighter  upper  bound  for  /3  is  6X~fp(zfc))  - 

4.  Choice  of  mixed  norm  parameter 


The  desirable  properties  of  7(0:)  were  mentioned  in  sec¬ 
tion  2.  In  addition,  in  section  3,  we  established  upper  and 
lower  bounds  for  7(0:)  based  on  the  convergence  analysis  of 
the  resulting  iteration. 

A  number  of  functions  defining  7(2?)  in  terms  of  the 
kurtosis  were  tested.  TWo  of  them  are  the  following,  with 
Uk-y-  Hxk, 


'y(xk)  =  'rixk-i)  -  Cixirik),  Cl  >  0,  (15) 

(16) 

H-exp(-C4x(njb)) 

In  both  cases,  7(xaj)  is  thresholded  so  that  it  belongs  to  the 
range  of  values  [0, 1]. 

The  updating  equations  (15)  and  (16)  ensure  that  for  val¬ 
ues  xi^k)  >  0  (leptokurtic  noise  distribution)  the  impor¬ 
tance  of  the  fourth  norm  is  reduced  relatively  to  the  sec¬ 
ond  norm.  For  values  of  x(x)  <  0  (platykurtic  noise  dis¬ 
tribution)  the  importance  of  the  fourth  norm  is  increased 
relatively  to  the  second  norm.  The  important  advantage  of 
the  proposed  algorithm  is  that  the  relative  importance  of  the 
two  norms  is  adjusted  automatically  based  on  the  partially 
restored  image,  without  the  need  to  specify  in  advance  the 
noise  distribution(s).  This  is  particularly  Important  when 
mixtures  of  noise  distributions  are  used. 


In  our  experiments,  we  used  the  256x256  pixels  image 
shown  in  Fig.  1.  The  original  image  is  blurred  by  7x7  uni¬ 
form  motion  blur.  Uniform  (sub-Gaussian)  and/or  Cauchy 
(super-Gaussian)  distributed  noise  signals  were  added  to  the 
blurred  image.  We  tested  the  proposed  algorithm  for  vari¬ 
ous  signal  to  noise  ratios  (SNR),  For  evaluating  the  perfor¬ 
mance  of  the  algorithm,  the  improvement  in  SNR  (dB)  was 
utilized.  It  is  defined  at  the  fcth  iteration  by 

Asnr  =  lOlogio  jj|2-  (17) 


The  criterion 


||xfc-n  ~  ^kWl  ^  -irj-e 


(18) 


was  used  for  terminating  the  iteration. 

In  Figs.  2  and  3,  the  noisy  and  blurred  images  with  20 
dB  Cauchy  noise,  and  20  dB  uniform  noise  are  shown,  re¬ 
spectively.  The  corresponding  restored  images  by  the  pro¬ 
posed  algorithm,  where  7(x)  in  Eq.  (16)  was  used  with 
C4  =  10"®,  are  shown  in  Figs.  4  and  5.  In  Table  1,  com¬ 
parative  results  are  shown.  In  all  cases  the  blur  used  is  the 
7x7  uniform  blur.  The  combination  noise  is  the  one  result¬ 
ing  from  equal  contributions  of  uniform  and  Cauchy  noise. 
The  LMS  and  LMF  approaches  are  compared  with  the  re¬ 
sults  of  the  proposed  algorithm.  By  LMMNl  and  LMMN2 
the  proposed  algorithm  is  denoted  with  the  use  respectively 
of  Eqs.  (15)  and  (16),  with  ci  =  C4  =  10“®. 

By  comparing  the  entries  of  Table  1  it  is  clear  that  the 
proposed  algorithm  performs  similarly  to  the  LMF  algo¬ 
rithm  for  the  uniform  noise  case  and  similarly  to  the  LMS 
algorithm  for  the  Cauchy  case,  as  expected.  For  the  com¬ 
bined  noise  case,  the  proposed  algorithm  (LMMN2)  per¬ 
forms  slightly  better  than  both  the  LMS  and  LMF. 

The  important  point  to  be  made  here  is  that  with  the  pro¬ 
posed  algorithm  no  knowledge  of  the  noise  distribution  is 
required,  and  the  relative  contribution  of  the  LMS  and  LMF 
approaches  is  adjusted  based  on  the  partially  restored  im¬ 
age. 

The  logarithm  of  the  error  between  the  original  and  re¬ 
stored  images  is  compared  for  one  experiment  for  the  LMS, 
LMF  and  the  proposed  LMMN2  approaches  in  Fig.  6.  The 
LMMN2  and  LMF  approach  approaches  converge  to  the 
same  point,  since  the  noise  is  uniformly  distributed. 

The  value  of  the  parameter  'y{xk)  as  a  function  of  the 
iteration  number  are  shown  in  Figs.  7  and  8.  In  Fig.  7 
for  both  noise  cases,  7(xjfe)  is  close  to  1,  that  is  only  one 
U  term  is  used,  as  expected.  In  Fig.  8,  although  the  noise 
is  uniform,  7(xjfe)  is  about  0.5  giving  equal  contributions  to 
the  I2  and  I4  norms.  This  is  explained  by  the  fact  that  the 
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SNR  is  relatively  high  (20  dB)  and  therefore  the  absolute 
value  of  the  kurtosis  is  not  as  large  as  for  the  10  dB  case 
(the  kurtosis  is  negative  in  both  case). 

6.  Conclusions 

In  this  paper,  we  proposed  an  iterative  mixed  norm 
restoration  algorithm.  The  algorithm  perform  similarly  to 
the  LMS  approach  for  Gaussian  and  super-Gaussian  noise 
distributions  and  similarly  to  the  LMF  approach  for  sub- 
Gaussian  noise  distributions,  and  for  mixed  noise  distribu¬ 
tion.  The  functional  which  determines  the  relative  impor¬ 
tance  between  the  I2  and  I4  in  the  mixed  norm  formulation 
is  automatically  adjusted  per  iteration  based  on  a  measure  of 
the  kurtosis  of  the  noise  using  the  partially  restored  image. 
Therefore,  no  prior  knowledge  of  the  noise  distribution  is 
required.  The  addition  of  a  smoothing  term  weighted  by  the 
regularization  functional  in  the  functional  to  be  minimized 
is  cunently  under  investigation. 
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Figure  1.  Original  image 


Figure  2.  Noisy  blurred  image  (7x7  uniform 
blur,  20  dB  Cauchy  noise) 


Figure  3.  Noisy  blurred  image  (7x7  uniform 
blur,  20  dB  uniform  noise) 


140 


Figure  4.  Restored  image  of  Fig.  2  by  pro¬ 
posed  algorithm,  As  nr  =  2.31  dB 


Figure  5.  Restored  image  of  Fig.  3  by  pro¬ 
posed  algorithm,  Asnr  =  2.68  dB 


Noise 

SNR 

LMS 

LMF 

LMMNl 

LMMN2 

type 

(dB) 

^SNR 

AsATil 

^SNR 

Asjvi? 

Uniform 

10 

2.98 

3.51 

3.35  • 

3.41 

Uniform 

20 

2.30 

2.71 

2.58 

2.68 

Uniform 

30 

3.56 

3.79 

3.66 

3.75 

Cauchy 

10 

2.82 

1.03 

2.81 

2.82 

Cauchy 

20 

2.31 

0.58 

2.31 

2.31 

Cauchy 

30 

3.48 

1.79 

3.43 

3.46 

Gaussian 

10 

2.94 

2.83 

2.90 

2.98 

Gaussian 

20 

2.31 

2.26 

2.28 

2.32 

Gaussian 

30 

3.54 

3.50 

3.51 

3.54 

Combination 

10 

2.95 

3.33 

3.30 

3.40 

Combination 

20 

2.30 

2.57 

2.53 

2.66 

combination 

30 

3.61 

3.78 

3.75 

3.81 

Table  1.  Performance  comparison 
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Figure  7.  Values  of  7(2:)  as  function  of  iteration 
number 


Figure  6.  Mean  squared  error  comparison ;  20  Figure  8.  Values  of  7(2:)  as  function  of  iteration 
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Abstract 

A  practical  method  for  identifying  cubically  nonlin¬ 
ear  systems  is  presented  in  this  paper.  This  method 
identifies  the  system  by  using  the  higher- order  spec¬ 
tra  of  the  system  input  and  output.  Compared  to  the 
conventional  method^  which  requires  the  system  output 
to  be  sampled  at  six  times  the  bandwidth  of  the  input j 
the  proposed  method  only  requires  the  system  output 
to  be  sampled  at  twice  the  bandwidth  of  the  system  in¬ 
put.  This  greatly  reduces  the  required  computation  and 
processing  speed  of  the  circuits.  The  advantage  of  the 
proposed  method  over  the  conventional  one  is  demon¬ 
strated  via  computer  simulation. 


1.  Introduction 

The  Volterra  series  [1,  2,  3,  4,  5]  is  a  well  studied 
model  for  describing  nonlinear  systems.  In  modeling 
nonlinear  systems  using  Volterra  series,  the  required 
coefficients  in  the  model  increase  dramatically  with  the 
order  of  the  Volterra  series.  Therefore,  the  practical 
application  of  the  Volterra  model  is  often  limited  to 
mildly  nonlinear  systems  which  can  be  modeled  by  a 
lower-order  Volterra  series.  However,  since  the  third- 
order  Volterra  series  is  the  lowest-order  Volterra  model 
including  both  even  and  odd  order  nonlinear  terms,  the 
technique  of  using  Volterra  model  for  nonlinear  system 
identification  needs  to  be  extended  up  to  at  least  third 
order.  In  fact,  one  may  find  in  the  literature  that  many 
nonlinear  effects  in  science  and  engineering  can  be  ap¬ 
propriately  described  by  a  cubically  nonlinear  system 
and  have  been  successfully  modeled  by  a  third-order 
Volterra  series  [6,  7,  8].  For  such  applications,  a  reli¬ 
able  and  practical  method  for  identifying  the  cubically 
nonlinear  systems  will  be  appreciated. 


Building  upon  the  many  quadratically  nonlinear  sys¬ 
tem  identification  techniques  [9,  10,  11,  12],  the  devel¬ 
opment  of  identification  techniques  for  cubically  non¬ 
linear  systems  is  conceptually  straightforward.  How¬ 
ever,  it  is  by  no  means  trivial  due  to  the  great  com¬ 
plexity  associated  with  working  in  a  3-dimensional 
space.  For  determining  the  Volterra  transfer  func¬ 
tions  of  cubically  nonlinear  systems,  a  higher-order- 
spectrum  [13,  14]  based  method  was  developed  [15]. 
This  method  is  applicable  to  a  broader  class  of  non¬ 
linear  problems  in  the  sense  that  it  doesn’t  assume 
particular  statistics  for  the  input  and  thus  allows  a 
larger  variety  of  input  characteristics  (Gaussian  and 
non- Gaussian). 

Although  the  method  developed  in  [15]  (referred  to 
as  the  conventional  method  in  the  remainder  of  this 
paper)  is  quite  effective  in  determining  Volterra  trans¬ 
fer  functions  of  a  cubically  nonlinear  system,  in  digital 
implementation  it  requires  the  system  output  signal  to 
be  sampled  at  a  frequency  which  is  at  least  6  times  the 
bandwidth  of  the  system  input  (in  contrast  to  twice 
the  bandwidth  for  the  linear  system  case)  [15].  Using 
such  a  high  sampling  frequency  has  two  disadvantages. 
First,  3-times  faster  A/D  converters  and  processing  cir¬ 
cuits  are  required.  This  suggests  that  expensive  cir¬ 
cuitry  may  be  required  to  implement  these  methods. 
Second,  for  the  Volterra  transfer  functions  to  achieve  a 
required  frequency  resolution,  a  3-times  larger  number 
of  DFT  (discrete  Fourier  transform)  points  is  required. 
This  increases  the  computational  complexity  of  these 
methods  since  the  required  computation  increases  with 
the  number  of  DFT  points. 

It  is  the  objective  of  this  paper  to  develop  a  practical 
technique  for  identification  of  cubically  nonlinear  sys¬ 
tems  with  a  random  input  using  the  input/output  data 
sampled  at  the  Nyquist  sampling  rate  [16]  of  the  input. 
Note  that  this  sampling  rate  is  the  minimum  one  with- 
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out  leading  to  aliasing  in  the  input  data  samples  and 
thus  is  called  the  minimum  sampling  rate  in  this  pa¬ 
per.  Although  the  input  data  samples  are  immune  from 
aliasing  at  the  minimum  sampling  rate,  the  output  data 
samples,  however,  may  be  aliased  due  to  the  fact  that 
the  bandwidth  of  the  output  of  a  cubically  nonlinear 
system  can  be  as  large  as  3  times  the  bandwidth  of 
the  input  [15].  We  show  that,  under  the  minimum¬ 
sampling-rate  condition,  the  conventional  method  fails 
to  achieve  good  estimates  of  the  Volterra  transfer  func¬ 
tions.  To  deal  with  this  problem,  we  propose  a  new 
model  which  properly  accounts  for  the  aliasing  effects 
of  the  output.  Based  on  this  model,  we  develop  a 
minimum-sampling-rate  method.  By  properly  account¬ 
ing  for  the  aliasing  effects  of  the  output,  the  proposed 
method  is  able  to  properly  identify  the  Volterra  trans¬ 
fer  functions  of  a  cubically  nonlinear  system  by  using 
the  minimally  sampled  input /output  data.  The  effec¬ 
tiveness  of  the  proposed  method  is  demonstrated  by 
computer  simulation. 

2,  The  Conventional  Method 


In  this  section,  we  consider  identification  of  cubically 
nonlinear  systems  in  the  discrete  domain.  This  means 
the  input/output  signal  is  sampled  for  further  process¬ 
ing.  In  taking  samples  of  the  signals,  one  needs  to  be 
aware  of  the  fact  that  a  nonlinear  system  may  generate 
new  output  frequency  components  which  are  not  in  the 
input  frequency  band.  For  example,  due  to  intermodu¬ 
lation  and  harmonic  distortions,  a  cubically  nonlinear 
system  whose  input  has  a  bandwidth  of  W  may  gener¬ 
ate  an  output  with  a  bandwidth  as  large  as  3W  [15]. 
Therefore,  according  to  the  sampling  theorem  [16],  the 
sampling  frequency  needs  to  be  greater  than  or  equal 
to  QW  to  avoid  aliasing  of  the  output  data  samples  (in 
contrast  to  2W  for  the  linear  system  case).  This  is  3 
times  higher  than  the  required  sampling  frequency  for 
linear  systems.  Therefore,  one  needs  to  be  careful  in 
selecting  the  sampling  frequency  if  the  system  under 
consideration  is  nonlinear. 

Consider  a  cubically  nonlinear  system  with  an  in¬ 
put  x{t)  and  an  output  y{t),  where  x{t)  has  a  band¬ 
width  of  W.  Assuming  that  x{t)  and  y{t)  are  sam¬ 
pled  at  the  rate  fs  =  QW  to  prevent  output  data  sam¬ 
ples  from  aliasing,  then  the  discrete  frequency-domain 
third-order  Volterra  series  representation  of  the  system 
can  be  written  as  follows  [15]: 


Y{m) 


Y,{m) 


Yi{m)  +  Yiim)  +  Yslm)  +e{m), 

-3M<m<3M  (1) 

f  Hi{m)X(m)  |m|  <  M  ,  , 

\  0  M+1  <  \m\<ZM^  > 


H2ii,j)X{i)Xij), 

^2(m)  =  I  (3) 

(  0,  2M  +  1  <  |m|  <  3M 

Yzim)  =  Hs{i,j,k)X{i)X{j)Xik), 

|m|  <  3M  (4) 

where  X(-)  and  y(-)  are  the  discrete  Fourier  transforms 

(DFT’s)  of  the  input  and  output,  and  yi(-))  5^2(*)> 
l3(-)  are  the  linear,  quadratic,  and  cubic  output  terms 
of  the  system.  The  linear,  quadratic,  and  cubic  transfer 
functions  of  the  system  are  denoted  by  *)» 

and  i/3(-,  •,  •)  respectively  in  (2)-(4).  e(7n)  denotes  the 
modeling  error,  and  M  is  the  maximum  discrete  fre¬ 
quency  of  X{-)  with  a  non-zero  value.  If  an  V-point 
DFT  is  used,  then  for  the  sampling  rate  fs  =  6W  we 
have  M  =  TV  X  (147/5)  =  N/6  (M  is  rounded  to  its 
nearest  integer). 

Since  the  system  is  excited  by  an  input  whose  fre¬ 
quency  range  is  from  -M  to  M,  the  estimation  re¬ 
gion  of  the  quadratic  transfer  function  (QTF)  H2{i,j) 
in  the  2-D  {ij)  space  is  a  square,  as  indicated  by 
the  square  ‘ABCD’  in  Fig.  1.  Note  that  Y2{m)  in¬ 
cludes  contributions  from  those  H2(ijys  which  satisfy 
^  j  =  m.  That  is,  all  the  points  on  the  line  segment 
EF  in  Fig.  1  contribute  to  Y2[ra).  It  can  be  shown  that, 
by  using  the  symmetry  properties  H2{i,j)  =  H2[j,i) 
and  H2{—i^—j)  =  H2{i^j)^  the  QTF  in  the  square 
‘ABCD’  can  be  uniquely  determined  as  long  as  the 
QTF  in  the  triangle  ‘ABO’  is  known  [11].  Similarly, 
the  estimation  region  for  the  cubic  transfer  function 
(CTF)  in  the  3-D  space  is  a  cube, 

as  shown  in  Fig.  2.  Since  Y^{m)  includes  contribu¬ 
tions  from  those  H^{i,j,kys  which  satisfy  i  j  -\- 
k  =  m,  all  the  points  on  the  hexagon  ‘ABCDEF’  in 
Fig.  2  contribute  to  Ys{m).  By  using  the  symmetry 
properties  Hs{ijyk)  =  Hs{j,i,k)  =  Hs{i,kJ)  and 
-j, -A;)  =  Hs{iyj,k),  the  CTF  in  the  entire 
cubic  region  can  be  completely  specified  by  the  CTF 
in  the  pentahedron  ‘OIGHPN’  shown  in  Fig.  3. 

By  taking  into  account  the  aforementioned  symme¬ 
try  properties  of  the  Volterra  kernels,  we  may  rewrite 
(1)  in  a  vector  form  as  follows  [15]: 

X^(m)H(m)  +  £(m), 

0  <  m  <  3M  (5) 

[X(m)  Xj(m) 
if  0  <  m  <  M 
[X^im)  Xi’(m)]^, 
if  M  +  1  <  m  <  2M 
X3(m), 

if  2M  +  1  <  m  <  3M 


Y{m)  = 


X(m)  = 


143 


j 


Figure  1 .  The  estimation  region  for  the  quadratic 
transfer  function. 


plane  a  :  i+j+k  =  0 
plane  b  :  i+j+k  =  m 


Figure  2.  The  estimation  region  for  the  cubic 
transfer  function  before  considering  symmetry 
properties. 


plane  a  :  i+j+k  =  0 
plane  b  :  i+j+k  =  m 


Figure  3.  The  estimation  region  for  the  cu¬ 
bic  transfer  function  after  considering  symmetry 
properties. 


H(rn) 


[H{m)  H^(m)  H3(m)p, 
if  0  <  m  <  M 

if  M  +  1  <  m  <  2M 

HsM, 

if  2M  -I- 1  <  m  <  3M 


(7) 


where  the  quadratic  components  X2{m)  and  H2(to) 
are  defined  by 


X2(m)  = 


H2(m.)  = 


52(M,  m  -  M) 
52(M-1,to-M-1-1) 

S2{iq{rn)Jq{m)) 
H2{M  -  1,  m  -  M  -f  1) 


H2{iq{‘m),jq{m)) 

S2{i,3)  = 

I  2  if  j 

,  m  +  1  , 

iq{m)  =  L — 
jq{m)  =  rn  —  iq{m) 


(8) 


(9) 

(10) 
(11) 

(12) 

(13) 


and  the  cubic  components  X3(m)  and  H3(m)  are  de¬ 
fined  by 


X3(m)  = 


H3(m) 


S3{M,jcb{m),keb{m)) 
■S3(M,  jcfe(m)  - 1 ,  A:c(.(m)  d-l) 

53(io(m),  jce(m),  A;ce(m))  J 

H3{M,jcb{'m),kcb{‘m)) 
^?3(M,  jcb(w)-l,A:cb(m)-l-l) 


(14) 


H3{ic{m),jeeim),kce{m))  J 

S3{iJ,k)  =  I{i,j,k)X{i)X{j)X{k) 

{6  if  i  j  ^  k 
1  if  i=j=k 
3  otherwise 

^c{m)  =  L-  3  “J 

jcb{m)  =  M-r[M-m\ 

kcbi’’^)  —  m  —  M  —  ic6(m) 


ice  (to)  =  L 

kce{m) 

r[t] 


m  —  icjm)  -f  1 


J 


m  —  ic(TO)  —  ice(TO) 

ft  if  t  >  0 
1  0  otherwise 


(15) 

(16) 

(17) 

(18) 

(19) 

(20) 

(21) 

(22) 

(23) 
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where  [v\  denotes  the  largest  integer  being  less  than 
or  equal  to  v.  The  reader  is  referred  to  [15]  for  the 
derivation  of  the  above  formulas. 

By  minimizing  the  mean  square  error  E[\e{m)\'^]  in 
(5),  one  obtains  the  optimal  minimum  mean  square 
error  estimate  of  H(m),  say  H(m),  as  follows: 

H(m)  =  ^[X*(m)X'^(m)]“i£^[X*(m)y(m)](24) 

Note  that  £?[X*(m)X^(m)]  is  composed  of  higher- 
order  auto-moment  spectra  of  X(*)  up  to  6th  order, 
and  E[X*{m)Y{m)]  is  composed  of  higher-order  cross¬ 
moment  spectra  between  X(*)  and  F (•)  up  to  4th  order. 

3.  The  Proposed  Minimum-Sarnpling- 
Rate  Method 

In  this  section,  we  consider  sampling  the  in¬ 
put/output  signal  of  a  cubically  nonlinear  system  at 
the  minimum  sampling  rate.  That  is,  the  sampling  fre¬ 
quency  fs  is  set  to  2  IF,  where  W  is  the  bandwidth  of 
the  system  input.  Since  the  output  may  have  a  band¬ 
width  as  wide  as  3  IF,  the  resulting  output  data  samples 
are  aliased.  Under  such  a  circumstances,  the  relation 
between  X(-)  and  F(-)  described  by  (1)  is  no  longer 
appropriate.  In  fact,  the  total  quadratic  output  Yq{m) 
in  Y{m)  can  be  written  as  follows  [17]: 

Yq{m)  =  Y2{m)^Y2i-N  +  m), 

0  <  m  <  M  (25) 

Similarly,  the  total  cubic  output  Yc{m)  in  Y{m)  can  be 
written  as  follows: 

Fc(m)  =  Ys{m)-hYs{m  +  N)-\-Y3{^N-\-m), 

0  <  m  <  M  (26) 

Prom  the  above  analysis  we  see  that  an  appropri¬ 
ate  model  for  the  minimum-sampling-rate  case  is  as 
follows: 

Y{m)  =  Yi{m)‘^Yq{m)-\-Yc{m)‘\‘€{rn)  (27) 

where  Yi{m)  =  Fi(m).  The  discrete  model  described 
by  (27)  can  be  rewritten  in  a  vector  form  as  follows: 

Y{m)  =  H'^(m)X(m) -h  £:(m), 

m  =  0,...,M  (28) 

where 


X(m) 

=  [X(m)  X^(m)  Xj(m)]'^, 

0  <  m  <  M 

(29) 

H(m) 

0  <  m  <  M 

(30) 

The  model  described  by  (28)  is  referred  to  as  the 
minimum-sampling-rate  (MSR)  model  in  the  remain¬ 
der  of  this  paper. 

By  comparing  (28)  to  (5),  we  see  similar  vector 
forms.  Therefore,  the  same  MMSE  technique  used 
to  solve  (5)  can  be  applied  to  (28).  This  method 
is  referred  to  as  the  minimum-sampling-rate  (MSR) 
method  in  the  remainder  of  this  paper. 

4,  Computer  Simulation 

To  demonstrate  the  advantage  of  the  proposed  ap¬ 
proach,  we  conducted  a  simulation  using  the  following 
cubically  nonlinear  system: 

y(n)  =  —0,64x{n)-\-x{n—2)-\-0.9x^{n)-^x^{n—l) 

-f0.6x(n)^— 0.3a:(n— l)^-{-e(n)  (31) 

The  input  x{n)  and  the  additive  noise  e(n)  used  in  (31) 
are  zero-mean  white  Gaussian  random  signals  with  a 
SNR  of  40  dB. 

We  first  collected  16,000  input/output  data  samples 
and  divided  them  into  5,000  segments  of  32  samples 
each.  The  32-point  DFT’s  of  the  data  segments  were 
calculated  and  then  used  by  the  conventional  and  the 
MSR  methods  to  estimate  the  Volterra  transfer  func¬ 
tions.  Note  that  since  the  input  is  white  within  the 
discrete  frequency  range  (i.e.,  0  to  16),  the  maximum 
discrete  frequency  of  the  input  is  equal  to  the  dis¬ 
crete  Nyquist  frequency  16.  This  corresponds  to  the 
minimum-sampling-rate  condition  and  thus  the  out¬ 
put  data  samples  are  aliased.  For  this  case,  the  lin¬ 
ear,  quadratic,  and  cubic  transfer  function  estimates 
obtained  by  the  conventional  method  achieved  a  nor¬ 
malized  mean  square  error  (NMSE)  of  0.0097,  0.0020, 
and  0.0356,  respectively.  On  the  other  hand,  the  esti¬ 
mated  linear,  quadratic,  and  cubic  transfer  functions 
obtained  by  the  MSR  method  achieved  a  NMSE  of 
0.0043,  0.0008,  and  0.0065,  respectively.  These  results 
indicate  that  the  estimated  Volterra  transfer  functions 
obtained  by  the  MSR  method  are  more  accurate  than 
the  ones  obtained  by  the  conventional  method  under 
the  minimum-sampling-rate  condition. 

5,  Conclusion 

In  this  paper,  we  have  developed  a  practical  method 
for  cubically  nonlinear  system  identification  using  in¬ 
put  and  output  data  sampled  at  the  minimum  sampling 
rate.  The  benefit  of  using  the  minimum  sampling  rate 
is  that  it  significantly  reduces  the  required  processing 
speed  of  the  circuits  and  the  required  computation.  We 
showed  that,  under  such  a  sampling  rate,  the  proposed 
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method  outperforms  the  conventional  method.  The  ad¬ 
vantage  of  the  proposed  method  over  the  conventional 
one  have  been  demonstrated  by  computer  simulation. 
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Abstract 

Natural  images  contain  considerable  statistical  redun¬ 
dancies  beyond  the  level  of  second-order  correlations.  To 
identify  the  nature  of  these  higher-order  dependencies,  we 
analyze  the  bispectra  and  trispectra  of  natural  images. 
Our  investigations  reveal  substantial  statistical  dependen¬ 
cies  between  those  frequency  components  which  are  aligned 
to  each  other  with  respect  to  orientation.  We  argue  that  op¬ 
erators  which  are  selective  to  local  intrinsic  dimensional¬ 
ity  can  optimally  exploit  such  redundancies.  We  also  show 
that  the  polyspectral  structure  we  find  for  natural  images 
helps  to  understand  the  hitherto  unexplained  superiority 
of  orientation-selective  filter  decompositions  over  isotropic 
schemes  like  the  Laplacian  pyramid.  However,  any  essen¬ 
tially  linear  scheme  can  only  partially  exploit  this  higher- 
order  redundancy.  We  therefore  propose  nonlinear  i2D- 
selective  operators  which  exhibit  close  resemblance  to  hy¬ 
percomplex  and  end-stopped  cells  in  the  visual  cortex.  The 
function  of  these  operators  can  be  interpreted  as  a  higher¬ 
orderwhitening  of  the  input  signal 


1.  Introduction 

The  extraordinarily  large  amount  of  information  con¬ 
tained  in  images  of  natural  scenes  calls  for  efficient  data 
compression  techniques.  In  order  to  exploit  redundancies, 
most  of  today’s  image  encoding  schemes  rely  primarily  on 
second-order  statistics  in  conjunction  with  linear  systems 
theory.  Examples  are  predictive,  sub-band  (wavelet),  and 
transform  coding  (e.g.  KLT).  However,  even  such  common 
image  structures  like  locally  oriented  lines  and  edges  can¬ 
not  be  represented  by  second-order  statistics  (for  a  detailed 
argumentation  see  [10]).  Vector  quantization  could  theoret¬ 
ically  overcome  these  limitations  but  suffers  from  combina¬ 
torial  explosion. 


An  alternative  approach  can  be  derived  from  the  concept 
of  intrinsic  dimensionality  which  relates  the  degrees  of  free¬ 
dom  provided  by  a  signal  domain  to  the  degress  of  freedom 
actually  used  by  a  given  signal  [8].  This  concept  provides 
a  hierarchy  of  local  image  signals  in  terms  of  different  de¬ 
grees  of  redundancy: 

iOD-signals  are  constant,  i.  e.  u(a;,y)  =  const,  within  a 
local  window. 

ilD-signals  can  locally  be  approximated  by  a  function  of 
only  one  variable,  i.  e.  u{x^y)  =  u{ax  -h  by).  Exam¬ 
ples  are  straight  lines  and  edges.  Sinusoidal  gratings, 
the  eigenfunctions  of  linear  systems,  are  also  a  mem¬ 
ber  of  this  class. 

i2D-signals  are  neither  iOD  nor  ilD.  Examples  are  corners, 
junctions,  curved  lines  and  edges,  etc. 

The  hierarchy  of  intrinsic  dimensionality  seems  to  be  re¬ 
flected  in  biological  vision  systems,  which  are  well  adapted 
to  the  statistical  properties  of  the  environment.  Evolution 
caused  not  only  the  development  of  linear  isotropic  and 
orientation- selective  neurons,  but  also  of  nonlinear  i2D- 
selective  cells  known  as  ’hypercomplex’  or  ’end-stopped’. 
For  example,  in  primates  more  than  half  of  the  neurons  in 
area  V2  of  the  visual  cortex  are  selective  to  /2D-stimuli 
[5].  It  has  also  been  shown  that  the  least  redundant  i2D- 
information  is  already  sufficient  for  a  reconstruction  of  the 
original  image  signal  [1]. 

2.  Limitations  of  the  second-order  approach 

Our  statistical  investigations  of  natural  images  revealed 
that  the  hierarchy  in  terms  of  intrinsic  dimensionality  is  re¬ 
flected  in  different  class  probabilties  [6]  [7].  In  this  hier¬ 
archy  fOD-signals  are  the  most  common,  and  hence  most 
redundant  ones.  The  corresponding  type  of  redundancy  can 
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Figure  1.  Sample  image  of  an  ergodic 
orientation-only  process.  This  image  com¬ 
pletely  lacks  second-order  correlations  as 
can  be  seen  from  the  autocorrelation  function 
shown  in  the  inlet  (cf.  [9]). 

be  related  to  second-order  statistics,  in  form  of  the  charac¬ 
teristic  strong  emphasis  of  low  frequency  components. 

The  next  common  class  of  signals  are  the  i7Z)-signals, 
and  any  complete  statistical  description  has  to  take  them 
into  account,  too.  Since  the  investigations  of  Oppenheim 
and  Lim  it  is  known,  that  the  phase  relations  between  the 
frequency  components  play  an  important  role  for  the  repre¬ 
sentation  of  the  structural  properties  of  images  [4].  In  this 
context  the  location  of  events  such  as  lines  or  points  has 
been  identified  as  a  property  that  cannot  be  represented  by 
the  Fourier  amplitude,  i.  e.  by  second-order  statistics,  but 
requires  the  availability  of  Fourier  phase.  We  have  more 
recently  extended  this  critique  by  suggesting  that  an  fur¬ 
ther  important  deficit  of  the  second-order  approach  has  to 
be  seen  in  its  inability  for  the  representation  of  oriented 
structure  (f7Z)-signals)  [10].  In  order  to  illustrate  this,  we 
have  constructed  an  orientation-only  image  that  is  full  of 
oriented  structure  but  lacks  any  second-order  dependencies, 
i.e.  it  is  ’white*  (Fig.  1).  Since  second-order  theory  is  in¬ 
sufficient  to  capture  the  oriented  structure  in  this  image,  we 
have  concluded  that  ’local  orientedness’  is  a  higher-order 
statistical  property  [10].  This  point  will  be  pursued  further 
in  the  present  paper. 

3.  Higher-order  statistics  of  natural  images 

Surprisingly  little  is  known  about  the  higher-order  statis¬ 
tics  of  natural  images.  In  previous  studies  we  noted 


substantial  statistical  dependencies  remaining  between  the 
(second-order  uncorrelated)  outputs  of  orientation  selec¬ 
tive  filter  decompositions  (wavelets)  [6]  [7]  [10]  [11]. 
Here  we  use  higher-order  spectra  to  investigate  these  de¬ 
pendencies  in  a  more  systematic  way.  The  bispectrum 
C^{fxijyufx2jy2)  is  given  by  the  Fourier  transform 
of  the  third-order  cumulant  c^(xi,yi,  0:2, 2/2)  of  a  station¬ 
ary  random  process  {u{x,y)}.  Alternatively,  the  Fourier- 
Stieltjes  representation  of  this  process  offers  the  possibility 
to  express  the  bispectrum  directly  in  terms  of  the  compo¬ 
nents  dU{fx,fy)  [3]: 


E[dU{UiJyl)-dU{f,2,fy2)-dU*iU3,fy3)]  = 

(  CY{fxl,fyl,fx2,fy2)  _  //,i\  /f,,\ 


else 


From  this  equation  it  is  apparent  that  the  bispectrum  is  a 
measure  for  the  statistical  dependencies  between  three  fre¬ 
quency  components,  the  sum  of  which  equals  zero.  A  direct 
computation  in  the  frequency  domain  can  also  be  derived 
from  this  notation  [3]. 

We  have  extensively  investigated  the  bispectra  of  nat¬ 
ural  images  and  compared  them  to  those  of  noise  images 
with  almost  identical  first  and  second-order  statistics.  Only 
for  the  natural  images  we  found  a  concentration  of  bispec- 
tral  ’energy’  exactly  in  those  regions  where  the  frequency 
components  are  aligned  to  each  other  with  respect  to  their 
orientation  (an  example  is  shown  in  Fig.  2).  Systematic 
statistical  dependencies  of  a  similar  type  have  also  shown 
up  in  our  preliminary  investigations  using  trispectra  (see 
Fig.  3).  The  trispectral  ’energy’  is  concentrated  around  the 
aligned  frequency  triples  for  natural  images  as  it  is  for  sam¬ 
ples  from  the  orientation-only  process,  whereas  it  shows  a 
substantially  different  distribution  for  random  images  which 
are  equivalent  to  the  natural  ones  in  their  Ist-order  pdf  and 
2nd-order  acf.  This  prompts  us  to  conclude  that  the  pre¬ 
dominance  of  oriented  /7D-structures  in  natural  images  is 
reflected  in  the  concentration  of  polyspectral  ’energy’  of  co¬ 
oriented  frequency-tuples. 


4.  Exploitation  of  the  higher-order  statistics  of 
natural  images 

4.1.  Linear  filter  decompositions 

If  local  ’orientedness’  is  a  higher-order  property  not  re¬ 
flected  in  the  standard  second-order  statistics,  shouldn’t 
nonlinear  operators  be  required  for  its  exploitation?  This  is 
only  partially  correct.  While  it  is  certainly  true  that  an  ex¬ 
haustive  exploitation  of  higher-order  dependencies  will,  in 
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Figure  2.  (a,  left)  BIspectral  magnitude  |C3(fxi,fyi,fx2,fy2)|  of  a  natural  Image.  Shown  are  several 
sections  with  =  const,  and  fyi  =  const.  .  Note  the  elongated  concentration  of  bispectral  ‘energy’ 
in  regions  where  the  frequency  components  are  aligned  to  each  other  with  respect  to  orientation, 
(b,  right)  Bispectrum  of  a  noise  image  whose  first-order  pdf  and  second-order  acf  are  approximately 
equivalent  to  that  of  the  natural  image.  Here  the  ‘energy’-concentration  is  more  circular,  around  the 
void  points  (fxi  +  fx2  =  0,  fyi  +  fy2  =  0). 


general,  require  nonlinear  operations,  a  partial  exploitation 
may  well  be  achieved  by  suitable  linear  operators.  In  fact, 
the  advantages  of  certain  linear  decompositions  can  only 
be  properly  conceived,  once  the  higher-order  dependencies 
are  taken  into  account.  We  have  suggested  that  this  is  the 
case  with  the  encoding  of  images  by  orientation-selective 
decompositions  like  the  Cosine  transform  or  wavelet-like 
transforms  [10]. 

In  our  view,  the  usual  interpretation  of  the  relationship 
between  such  transforms  and  the  statistics  of  natural  images 
requires  a  revision,  although  the  current  view  appears  to  be 
consistent  at  a  first  glance:  The  KLT  yields  onented  basis 
functions,  orientation  obviously  is  an  important  structural 
property  of  natural  images,  hence  usage  of  the  optimally 
adapted,  i.e.  oriented  basis  functions  for  image  compression 
will  yield  positive  results.  A  closer  look,  however,  reveals 
some  problems  with  this  interpretation. 

In  fact,  strict  application  of  the  standard  second-order 
reasoning  would  predict  that  the  compression  performance 
of  an  isotropic  decomposition,  like  the  Laplacian  pyra¬ 
mid,  should  be  close  to  the  one  achievable  with  a  fur¬ 
ther  orientation-selective  splitting  of  the  frequency  bands 
[10],  This  is  due  to  the  fact  that  no  deviation  from  spec¬ 
tral  flatness  can  be  exploited  along  the  orientation  variable. 
However,  practical  experience  and  statistical  investigations 


showed  the  contrary:  substantial  additional  savings  in  data 
rate  can  be  obtained  if  an  image  is  not  only  decomposed  by 
radial  bandpass  funtions  but  is  also  split  up  with  respect  to 
orientation ,  e.g.  [12]. 

While  this  fact  cannot  be  understood  within  the  frame¬ 
work  of  second-order  statistics,  our  results  indicate  that  it 
might  be  explained  by  taking  into  account  the  structure  of 
higher-order  spectra  since  only  orientation- selective  filter 
decompositions  can  appropriately  adapt  to  the  concentra¬ 
tion  of  higher-order  spectral  ’energy’  along  the  ’f7D-lines’ 
as  revealed  by  our  measurements.  An  isotropic  decompo¬ 
sition,  like  the  Laplacian  pyramid,  will  inevitably  yield  a 
mismatch  to  this  specific  type  of  structure. 

4.2.  Nonlinear  operators 

While  an  orientation-selective  linear  decomposition  can 
partially  exploit  higher-order  dependencies,  nonlinear  oper¬ 
ators  will  be  required  in  general.  This  raises  the  question  of 
appropriate  types  of  nonlinear  operators.  We  will  argue  that 
f2Z)-selective  operators  are  suitable  candidates. 

Starting  from  the  Volterra- Wiener  expansion  of  nonlin¬ 
ear  functionals,  a  quite  general  class  of  f2Z)-selective  oper¬ 
ators  can  be  derived.  The  Volterra  series  relates  the  input 
Ui  (a:,  y)  of  a  nonlinear,  shift  invariant  system  to  its  output 


149 


•fc  ’i,  i".  4  /;  *  ^ »" 

iAt  »m  >•>«  ^5v' 

a  v-  ^'^'^v'':,  If#. 

•*¥i'«'fe  *?•''#  f  ■*'* S' #''#'•'  v>w.':  #1?  y"^ 

4#f#  f,  f  ;J'V.  f  s  ''•  if.  4'  - 

^.■>  ff-rrw;;,-  ■■?*«##■ 

f -a-  ^'^''''^t-'^t'-,*.  ■*'.'  " ,  V  *'  -■  ^ '.*>. 

<>  •-'•■#.•%#  -''■ 

s.'Vm  ►»!,'-■ 


. ife.ffe';?i«..s...!^'4mf'. 

»■  «§•■■»■  fm^ 

-Wr  'it. 

' yfirr-';si»».  a--;  ^  j.; 


Figure  3.  (a,  left)  Trispectral  magnitude  |C4(fxi,fyi,fx2,fy2,fx3,fy3)|  of  a  natural  image.  Note  the 
elongated  concentration  of  trispectral  ‘energy’  in  regions  where  the  frequency  components  are 
aligned  to  each  other,  (b,  middle)  Trispectrum  of  the  contour  noise  image  given  in  Fig.  1. 
(c,  right)  Trispectrum  of  a  noise  image  whose  first-order  pdf  and  second-order  acf  are  approximately 
equivalent  to  that  of  the  natural  image.  Here  the  ‘energy’-concentration  is  more  circular. 


U2{x,  y)  in  the  following  way: 

U2{x,y)  =  (1) 

-//  hi{xi,yi)  ■  ui{x  -  xi,y  —  yi)  ■  dxidyi  + 

^////  h2{xi,yi,X2,y2)  *  ui{x  -  xi,y-  yi)  • 
'Ui{x  -X2,y-  2/2)  *  dxidyidx2dy2  +  . . . 

The  quadratic  part  of  Eq.  (2)  may  be  expressed  in  the  fre¬ 
quency  domain  as 

t^2(/xj  fy)  ^2(/a:l)  /3/I  j  /x  “  fxl^fy  ~~  fyl)  dfxldfyi 

where 


U2ifxlJylJx2jy2)=  (2) 

=  H2{fxl^  fylT  fx2i  fy2)  *  Ui{fxl^  fyl)  ’  f^l(/a;25  fy2) 

is  the  expanded  output  spectrum  and  i?2  (/xi ,  fyi  j  fx2 ,  fy2) 
is  the  Fourier  transform  of  the  second- order  Volterra  ker¬ 
nel  h2{xi,yi,X2,y2)^  Note  that  Eq.  (2)  may  be  regarded 
as  the  weighting  of  an  AND-like  conjunction  between  fre¬ 
quency  components.  A  necessary  and  sufficient  condition 
for  a  quadratic  Volterra  operator  to  be  insensitive  to  iOD- 
and  UD-  signals  is  given  by  [2]: 


H2ifxlifylyfx2,fy2)  —  0  V  fxl  ’  fy2  —  fyl  *  /x2 

The  operation  of  systems  adhering  to  this  condition  can  be 
regarded  as  a  blocking  of  aligned  frequency  components 
(see  Fig.  4,  an  example  is  provided  in  Fig.  5).  Since  the  re¬ 
dundancies  of  natural  images  are  reflected  in  the  concentra¬ 
tion  of  polyspectral  magnitude  at  aligned  frequency  compo¬ 
nents,  /2D-selective  systems  have  the  potential  to  perform  a 
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Figure  4.  Illustration  of  the  forbidden  zones 
(indicated  as  black  lines)  for  an  i2Z)-selective 
quadratic  Volterra  kernel  H2(fxi,fyi,fx2,fy2) 
in  the  frequency-domain.  A  system  whose 
symmetric  kernel  vanishes  at  the  forbidden 
zones  may  be  regarded  as  blocking  frequency 
components  of  equal  orientation  [2], 

kind  of  ’higher-order  whitening’,  just  as  differentiating  lin¬ 
ear  operators  have  the  potential  to  whiten  the  l//^-decay  of 
the  power  spectrum  of  natural  images. 
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Figure  5.  Application  of  22Z>-selective  operators  to  natural  images:  (a,  left)  Example  of  a  natu¬ 
ral  image,  (b,  middle)  Frequency-domain  Volterra  kernel  of  an  /2D-selective  operator.  Note  that 
the  f2D-selectivity  is  guaranteed  since  the  ’forbidden  zones’  of  Fig.  4  are  taken  into  account, 
(c,  right)  Image  resulting  from  the  application  of  the  f2Z)-selective  Volterra  operator  (only  positive 
responses  are  shown).  The  original  image  can  be  reconstructed  from  such  f2D-only  representations 
if  multiple  scales  are  taken  into  account  [1]. 


5.  Conclusion 

The  frequent  occurence  of  local  oriented  features  is  a 
basic  structural  property  of  natural  images  that  cannot  be 
captured  by  second-order  statistics.  Investigations  of  the 
polyspectra  revealed  that  this  property  is  reflected  in  the 
concentration  of  polyspectral  magnitude  in  those  regions 
where  the  frequency  components  are  aligned  in  orientation. 
This  fact  can  partially  be  exploited  by  orientation- selective 
linear  filter  decompositions.  However,  a  full  exploitation  re¬ 
quires  nonlinear  schemes,  for  which  /2jD-selective  operators 
are  suitabe  candidates. 

Just  like  the  distinction  between  a  constant  and  a  varying 
signal  can  be  regarded  as  an  elementary  operation  in  lin¬ 
ear  signal  processing,  the  distinction  between  ilD  and  /2D- 
signals  can  be  regarded  as  a  basic  operation  in  the  nonlinear 
processing  of  images.  In  this  sense,  /2D-operators  can  be 
seen  as  performing  a  ’nonlinear  differentiation’  beeing  ca¬ 
pable  of  exploiting  the  elementary  redundancies  of  natural 
images. 
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Abstract 

The  objective  of  the  paper  is  to  introduce  a  new  adap¬ 
tive  filtering  algorithm  for  estimating  frequency-domain 
second-order  Volterra  filter  coefficients.  The  approach 
rests  upon  the  normalized  LMS  (NLMS)  algorithm  and 
the  frequency-domain  block  LMS  algorithm.  The  utilizu- 
tion  of  the  normalized  LMS  algorithm  facilitates  choice 
of  a  proper  step  size,  with  which  the  adaptive  frequency- 
domain  Volterra  filter  is  guaranteed  to  be  convergent 
in  the  mean-squared  sense,  and  improves  convergence 
rate.  The  frequency-domain  block  LMS  algorithm  estimates 
frequency-domain  second-order  Volterra  filter  coefficients 
which  correspond  to  the  DFT  of  the  time-domain  Volterra 
filter  coefficients. 


1.  Introduction 

In  the  last  two  decades,  nonlinear  digital  filters,  and 
in  particular,  digital  Volterra  filters  [1],  have  become  the 
subject  of  increasing  interest.  In  many  cases,  the  core  of 
the  problem  is  the  identification  of  an  unknown  nonlin¬ 
ear  system.  In  relation  to  the  identification  methodology, 
the  computational  simplicity  of  the  well-know  LMS  algo¬ 
rithm  has  considerable  appeal.  However,  it  is  well-known 
that  an  LMS-type  Volterra  filter  suffers  from  slow  conver¬ 
gence  even  for  independent  identically  distributed  Gaus¬ 
sian  inputs[2].  This  phenomenon  is  rooted  from  the  fact 
that  the  quadratic  component  of  the  second-order  Volterra 
filters  takes  the  products  of  the  linear  inputs  as  its  input 
sequence.  Such  correlated  inputs  result  in  a  large  eigen¬ 
value  spread  of  the  corresponding  correlation  matrix,  which 
leads  to  slow  convergence.  This  holds  for  the  time-  and 
frequency-domain  adaptive  Volterra  filters. 


'This  work  was  supported  by  GYONAE  Research  Project  97. 


Several  authors  have  demonstrated  the  utility  of  time- 
and  frequency-domain  Volterra  filters  updated  using  an 
LMS-type  procedure  for  a  range  of  applications  [3,  4, 
5],  where  different  step  sizes  were  applied  to  linear  and 
quadratic  filters  in  order  to  overcome  the  slow  convergence 
problem.  However,  it  is  still  difficult  to  choose  the  optimal 
step  sizes  for  the  linear  and  quadratic  filters.  In  addition  to 
the  problems  mentioned  previously,  the  frequency-domain 
adaptive  Volterra  filters  suffer  from  the  wrap-around  error 
because  of  the  periodicity  of  the  discrete  frequency  domain. 
This  can  be  avoided  by  the  block  LMS  algorithm  [6]  based 
on  the  overlap-save  method. 

The  objective  of  the  paper  is  to  introduce  a  new  adap¬ 
tive  filtering  algorithm  for  estimating  frequency-domain 
second-order  Volterra  filter  coefficients,  which  correspond 
to  the  DFT  of  the  time-domain  Volterra  filter  coefficients. 
In  order  to  avoid  the  wrap-around  error  in  the  frequency- 
domain  processing,  the  frequency-domain  Volterra  filter  uti¬ 
lizes  the  frequency  domain  ELMS  algorithm  [6].  Accord¬ 
ing  to  [7],  the  conventional  linear  adaptive  filters  based  on 
the  NLMS  algorithm  obtain  rapid  convergence  for  highly 
correlated  signals,  by  improving  a  large  eigenvalue  spread. 
With  similar  reasoning,  our  approach  rests  upon  the  NLMS 
algorithm  in  order  to  solve  the  slow  convergence  problem. 

The  remainder  of  this  paper  is  organized  as  follows. 
The  following  section  describes  the  second-order  time-  and 
frequency-domain  Volterra  filters,  and  presents  the  ELMS 
algorithm  for  the  frequency-domain  second-order  Volterra 
filter.  In  Section  3,  we  develop  the  normalized  block  LMS 
algorithm  for  the  frequency-domain  Volterra  filters.  Section 
4  contains  a  numerical  example  to  demonstrate  the  validity 
of  the  proposed  algorithm  by  comparing  the  time-domain 
LMS  algorithm  with  multiple  step  sizes.  Finally,  the  paper 
is  concluded  in  Section  5. 
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2.  Second-Order  Volterra  Filters 


If  we  assume  that  the  nonlinear  system  to  be  identified 
is  stable  and  of  finite  memory,  and  has  nonlinearities  up  to 
second  order,  the  second-order  Volterra  filter  can  approxi¬ 
mate  the  output  of  the  nonlinear  system  by  its  sampled  data 
form.  The  output  can  be  represented  as  follows; 

N-l 

yt{n)  =  ^ /ii(i)a:(n-i) 

i=:0 

N-l  N-l 

+  -  i)x{n  -  j) 

i=Q  j=0 

+  e(n)  (1) 

where  (•)  and  x(-)  are  observed  responses  and  excitations, 
respectively.  and  h2(‘,-)  are  linear  and  quadratic 

Volterra  filter  coefficients.  When  y(n)  is  the  output  of  the 
second-order  Volterra  filter, 


e{n)  =  yt(n) -y(n).  (2) 

In  the  discrete  frequency  domain  based  on  an  M-point 
DFT,  the  second-order  Volterra  filter  can  be  represented  as 
follows[8]; 


Yt{m)  =  Hi{m)X{m) 

M-l  M-l 

H2(p,q)X(p)Xiq)6M{m-p-q) 

P=0  q—0 

+Eim)  (3) 


where 


r  1,  (m  modulo  M)  =  0 
\  0,  {m  modulo  M)  ^  0 


(4) 


In  (3),  Yti-l  XO,  iTi(-),  and  £?(•)  are  DFT’s  of 

yt{-)y  x{-)y  hi{'),  •)»  and  e(*),  respectively.  Note  that 

the  frequency-domain  Volterra  filter  (3)  selects  input  fre¬ 
quency  component  pairs  (p,  q)  using  the  delta  function  de¬ 
fined  by  modulo  M. 


Let  us  denote  the  input  of  the  frequency  domain 
Volterra  filter  (3)  by  the  complex  vector  X(^)  = 

[Xo,  Xiy. Xm]'^  where  the  superscript  T  denotes  trans¬ 
pose,  and  the  number  of  elements  is  M  -f  1.  Each  Xk  is 
determined  by 


={ 


X{m), 


k  =  0 


X{pk)X{qk),  k=l,....,M 


(5) 


where  {pk,qk)  is  selected  hy  5M{m  -  pk  -  qk)  =  1.  De¬ 
note  the  frequency-domain  Volterra  filter  coefficient  vector 
by  ff(m)  =  [Hq,Hi  Hm],  where 


r  k  =  0 

\  H2ipk,qk),  k  =  l,...,M 


(6) 


Then,  the  output  of  the  frequency-domain  Volterra  filter  is 
given  by 

Y{m)  =  H{m)X{m)  (7) 

The  block  update  equations  [6]  are  given  by 

H'^+^  (m)  =  (8) 

where  k  represents  the  K-th  update  time,  p  is  a  step  size. 
V^(m)  denotes  an  estimate  of  the  gradient  at  time  «,  and  is 
given  by 

T{m)  =  mm)M{puqi),..., 

V2(pM,gM)],  (9) 

where  {pk,qk)  satisfies  SM{m  -  pk  -  qk)  =  1,  and  the 
components  Vf(m)  and  X^ipk^Qk)  are  computed  by  the 
procedure  given  in  Appendix. 

Recall  that  for  a  linear  adaptive  filter,  its  convergence 
is  guaranteed  only  if  the  step  size  p  is  less  than  2/Xmax 
[9],  where  Xmax  is  the  largest  eigenvalue  of  the  correlation 
matrix  of  the  inputs.  In  addition,  when  the  eigenvalues  are 
widely  spread,  (that  is,  ratio  of  Xmax/Xmin  is  large)  the  rate 
of  convergence  is  limited  by  the  smallest  eigenvalue.  Sim¬ 
ilar  conditions  hold  for  the  Volterra  filter  [2].  As  shown  in 
(5),  the  quadratic  input  components  consist  of  the  products 
of  the  linear  components,  which  results  in  high  correlation 
among  the  input  components. 


3.  A  Normalized  Block  LMS  Algorithm 


In  order  to  improve  the  convergence  characteristics  of 
the  frequency-domain  Volterra  filter,  we  apply  the  normal¬ 
ized  LMS  algorithm[9]  to  (8),  because  for  correlated  input 
signals,  the  NLMS  algorithm  improves  a  eigenvalue  spread, 
which  influences  the  rate  of  convergence  [7]. 


The  normalized  block  update  equation  is  given  by 

-I-  G{a^{m)X'^^im)*E^{m)} 

(10) 

where  the  superscript  H  denotes  complex  conjugate  trans¬ 
pose  and  *  represents  array  multiplication.  Note  that  the 
step  size  p  in  (8)  is  replaced  with 


a^(m)  = 


P 

[e  +  X'^“{Tn)X'^{m)]  ’ 


(11) 


where  ^  is  a  fixed  scalar  with  0  <  P  <  2  and  e  is  a  small 
positive  constant  that  bounds  a^{m)  when  X.^{m)  is  mo¬ 
mentarily  small.  X^^(m)X^(m)  is  an  estimate  of  the  in¬ 
put  signal  power  to  the  Volterra  filter  in  the  m-th  frequency 
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bin  at  time  k.  In  (10),  G  stands  for  the  operation  of  impos¬ 
ing  a  constraint  on  *  E^{m)  in  order  to  achieve  a 

linear  correlation.  The  error  vector  E^{m)  is  defined  by 

E^{m)  =  (12) 

where  each  component  is  obtained  through  the  procedure 
given  in  Appendix. 

Table  1.  Linear  and  Quadratic  Filter  Coeffi¬ 
cients  Used  in  the  Volterra  System 


1  hi  (i) 

1  h2(i,j) 

m 

i  0  J 

1  1  1 

2 

3 

0.0582 

-0.1609 

-0.5075 

-0.0175 

0.2408 

-0.0601 

-0.0785 

-0.6678 

0.8759 

0.2165 

0.8759 

0.2876 

4.  An  Example 

The  objective  of  this  example  is  to  demonstrate  the  va¬ 
lidity  of  the  proposed  algorithm  and  to  compare  its  perfor¬ 
mance  with  that  of  the  time-domain  LMS  algorithm  [2].  In 
this  example,  the  adaptive  Volterra  filter  is  used  in  the  sys¬ 
tem  modeling  mode.  The  proposed  adaptive  Volterra  filter 
has  the  equivalent  order  and  number  of  coefficients  in  the 
frequency  domain  consistent  with  the  time-domain  Volterra 
system  to  be  identified.  The  time-domain  adaptive  filter 
based  on  the  LMS  algorithm  has  the  same  order  and  num¬ 
ber  of  coefficients. 


In  this  example,  we  compare  the  performance  of  the  pro¬ 
posed  algorithm  and  the  time-domain  LMS  algorithm  in 
terms  of  coefficient  error  NMSEC  and  filter  output  error 
NMSEO.  The  coefficient  error  NMSEC  is  defined  by 

NMSECin)  =  (13) 


where  h  is  the  time-domain  Volterra  coefficient  vector, 
whose  elements  are  given  in  Table  1.  For  the  proposed 
algorithm,  is  the  inverse  DFT  of  the  estimate  of  the 
frequency-domain  Volterra  coefficients  at  time  k  while 
is  the  estimate  of  the  time-domain  Volterra  coefficients  for 
the  time-domain  LMS  algorithm.  The  filter  output  error  is 
defined  by 


NMSEO{k) 


||[e(KAr),...,e(/cAr  +  J\r-l)]||^ 

\\[ytiKN),...,yt{>^N  +  N-l)W 

(14) 


Figure  1.  Coefficient  error  versus  number  of 
iterations. 


where  [e{KN), . . . ,  e{KN  +  —  1)]  and 

[yt{KN), . . . ,  yti^N  -f  A*  -  1)]  are  an  output  error  vector 
at  time  k  and  an  observed  output  vector  at  time  k,  respec¬ 
tively. 

In  this  example,  we  consider  the  second- order  Volterra 
system  with  the  filter  coefficients  shown  in  Table  1,  while 
the  input  sequence  is  chosen  to  be  Gaussian-distributed.  A 
zero-mean  white  Gaussian  noise  of  SNR==  100  dB  is  added 
to  the  system  output.  For  the  proposed  algorithm,  /3’s  are 
set  to  0.5,  1,  and  1.5.  For  the  time-domain  LMS  algorithm, 
as  proposed  in  [2],  the  step  size  of  the  linear  filter  is  set  to 
l/X^ax^  where  is  the  largest  eigenvalue  of  the  auto¬ 
correlation  matrix  of  the  linear  input  vectors  and  that  of  the 
quadratic  filter  is  set  to  where  is  the  largest 

eigenvalue  of  the  autocorrelation  matrix  of  the  quadratic  in¬ 
put  vectors.  The  results  shown  in  the  figures  are  ensemble 
averaged  over  100  independent  experiments.  For  the  time- 
domain  LMS  algorithm,  the  mean  value  of  the  linear  step 
size  is  about  0.6613  x  10“^  over  100  experiments,  while 
that  of  the  quadratic  step  size  is  about  0.0926  x  10“^. 

The  coefficient  error  curves  and  the  output  error  curves 
of  the  two  algorithms  are  shown  in  Figures  1  and  2,  respec¬ 
tively.  In  the  figures,  the  curve  labeled  “TDLMS”  is  ob¬ 
tained  by  the  time-domain  LMS  algorithm,  while  the  re¬ 
maining  curves,  labeled  “BETA=0.5”,  “BETA=1.0”,  and 
“BETA=1.5”,  are  obtained  by  the  proposed  algorithm  with 
different  p's  of  0.5,  1.0,  and  1.5.  Note  in  Figure  1  that  the 
coefficient  estimates  by  the  proposed  algorithm  converge  to 
the  true  value  with  error  of  about  10”^^  after  about  900  it¬ 
erations  (/?  =  0.5)  while  the  time-domain  LMS  algorithm 
seems  to  need  more  iterations.  As  /?  is  reduced,  the  rate 
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Figure  2.  Output  error  versus  number  of  iter¬ 
ations. 


500  1000 

Number  of  Iterations  k 


1500 


Figure  3.  Mean  of  a'^(m)//?  over  m’s  versus 
number  of  iterations. 


of  convergence  of  the  proposed  algorithm  is  correspond¬ 
ingly  reduced.  From  Figure  2,  we  observe  that  a  reduc¬ 
tion  in  the  value  of  p  has  the  effect  of  reducing  the  fluctu¬ 
ation  in  the  curve.  Figure  3  shows  the  mean  values  of  the 
^  *  frequency  bins. 

The  proposed  algorithm  limits  the  range  of  /?.  Thus,  it 
is  somewhat  easier  to  choose  a  proper  step  size  with  which 
the  proposed  algorithm  is  convergent  in  the  mean-squared 
sense,  while  the  time-domain  LMS  algorithm  requires  the 
information  on  the  largest  eigenvalue  of  the  correlation  ma¬ 
trix  of  the  input  sequence  for  determining  the  optimal  step 
size. 


5.  Summary 

We  presented  a  normalized  block  LMS  algorithm  for 
estimating  frequency-domain  second-order  Volterra  filter 
coefficients.  The  validity  of  the  proposed  algorithm  was 
demonstrated  though  computer  simulation.  The  use  of  the 
normalized  LMS  algorithm  facilitates  choice  of  a  proper 
step  size  because  the  algorithm  limits  the  range  of  and 
improves  the  convergence  rate.  With  use  of  the  block  al¬ 
gorithm,  the  estimated  frequency-domain  coefficients  con¬ 
verge  to  the  DFT  of  the  time-domain  Volterra  filter  coef¬ 
ficients.  Even  though  we  did  not  demonstrate  in  this  pa¬ 
per,  the  block  LMS  algorithm  provides  large  savings  in 
the  computational  complexity  by  replacing  the  one-  and 
two-dimensional  convolutions  and  correlations  with  the 
frequency-domain  multiplications  using  the  efficient  FFT 
algorithm. 


6.  Appendix 

In  the  followings,  the  bar  denotes  complex  conjugate, 
and  ID  DFT  and  2D  DFT  represent  M-point  and  (M,  M)- 
point  DFT’s,  respectively.  It  is  also  assumed  that  M  =  2iV, 
where  AT  is  a  time-domain  memory  length  and  M  is  a  block 
size  for  DFT. 

6.1.  Output  Error  Sequence  and  DFT 

•  ID  Error  Sequence 

e?  =  e{K,N), e{KN  +  N-  1)]^  (15) 

N 

where  e(n)  is  given  by  (2). 

=  [E^{0), . . . ,  -  1)]^  =  ID  DFT{e^ }. 

(16) 

•  2D  Error  Sequence 

E^(p,q)  =  j^E;tHp  +  q)M),  (17) 

where  (•)m  represents  the  modulo-M  operation, 

6.2.  Gradient  Constraints 


•  For  the  linear  component 

V?  =  ID  DFT{[5r(0), . .  .,9UN-1),  0, . . .  ,0]^}. 

(18) 

where 

9l(i)  =  ID  IDFT{X'‘(m)£;«(m)},  (19) 
*  =  0,...,2iV-l 
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•  For  the  quadratic  component 


VJ  =  2D  DFT{[52”(i,  j)W2(*,  j)]}  (20) 


where 


0<i,  j  <N  -1 
otherwise 


(21) 


and 

[9^{i,j)]  =  2D  mFT{[X^{p)X'^iq)E^{p,q)]} 

(22) 

where  z,  p,  and  g  =  0, . . . ,  M  —  1. 
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Abstract 

The  effects  of  conventional  data  windows  on  Volterra 
transfer  function  estimation  are  investigated.  The  in¬ 
put/output  data  for  two  known  second-order  systems  are  uti¬ 
lized  to  estimate  the  transfer  functions,  and  the  results  are 
compared  with  true  values.  In  addition,  the  use  of  win¬ 
dow  correction  factors  to  offset  the  bias  introduced  into  the 
higher-order  moment  spectra,  by  the  fact  that  the  data  is  at¬ 
tenuated  at  the  beginning  and  end  of  a  record,  is  investi¬ 
gated.  In  all  cases,  it  is  found  that  the  rectangular  window 
results  in  the  smallest  NMSE  (normalized  mean  square  er¬ 
ror)  for  the  estimated  quadratic  transfer  functions. 


1.  Introduction 

Volterra  transfer  function  estimation  in  the  frequency  do¬ 
main  using  higher-order  spectra  has  been  studied  for  many 
years  and  applied  to  analysis  of  nonlinear  phenomena.  For 
nonGaussian  inputs,  second-order  Volterra  transfer  function 
estimation  in  the  frequency-domain  utilizes  estimates  of  the 
auto-spectra  of  the  input  up  to  fourth-order  and  cross-spectra 
of  input  and  output  up  to  third-order.  So  the  quality  of 
the  transfer  function  estimation  is  directly  related  to  that  of 
polyspectral  estimation. 

It  is  well  known  in  classical  power  spectral  estimation[l] 
that  if  the  signal  is  non-periodic,  or  periodic  with  the  data 
length  not  equal  to  an  integer  multiple  of  periods,  then  spec¬ 
tral  leakage  exists.  Thus  we  usually  employ  conventional 
data  windows  to  reduce  the  spectral  leakage.  The  research 
focus  of  bispectral  estimation  has  been  on  the  variance  of  es¬ 
timation.  In  that  sense  many  researchers  have  suggested  us¬ 
ing  a  data  window  in  bispectral  estimation  in  order  to  reduce 
the  variance  of  estimation[2, 3].  K.  Sasaki,  et  al.[4]  derived 
the  bias  and  variance  bound  for  the  indirect  class  of  bispec¬ 
tral  estimation.  V.  Chandran  and  S.L.  Elgar[5]  analyzed  the 
bias  and  variance  of  the  direct  class  of  bispectral  estimation 


including  leakage  effects. 

When  a  data  window  is  used,  the  beginning  and  end 
of  the  data  record  to  be  Fourier  transformed  are  attenu¬ 
ated.  Thus  the  mean  square  value,  mean  cube  value,  mean 
quartic  value,  etc.  of  the  windowed  data  are  reduced, 
and,  hence,  the  corresponding  higher-order  moment  spec¬ 
tra  (power  spectrum,  bispectrum,  trispectrum,  etc.)  will  be 
underestimated,  i.e.,  they  will  be  biased.  One  way  to  cor¬ 
rect  (approximately)  for  this  bias  is  to  multiply  the  relevant 
polyspectrum  by  an  appropriate  window  correction  factor 
(w.c.f.). 

Empirical  evidence  has  suggested  that  the  use  of  a  rect¬ 
angular  window  (which  is  equivalent  to  no  data  window) 
yields  the  “best”  Volterra  transfer  function  estimators.  This 
“evidence”  is  not  quantitative  in  that  it  is  often  based  on 
physical  insight  into  the  nonlinear  system/phenomena  being 
modeled.  For  this  reason,  the  results  of  a  systematic  study 
of  the  effects  of  windows  and  their  corresponding  w.c.f. ’s 
on  frequency  domain  Volterra  transfer  function  estimation 
is  presented  in  this  paper. 

In  Sec.  2  we  overview  the  frequency-domain  second- 
order  Volterra  model,  and  the  estimation  of  the  Volterra 
transfer  functions  using  an  appropriate  hierarchy  of  auto- 
and  cross-spectra.  In  Sec.  3  we  consider  two  different 
second-order  systems  characterized  by  known  linear  and 
quadratic  transfer  functions.  The  Volterra  transfer  func¬ 
tions  are  then  estimated  using  a  variety  of  data  windows, 
with  and  without  w.c.f. ’s.  Both  Gaussian  and  exponen¬ 
tially  distributed  random  inputs  are  utilized.  A  normalized 
mean  square  error  (between  the  true  and  estimated  quadratic 
Volterra  transfer  function  coefficients)  is  used  as  a  “good¬ 
ness”  criterion.  In  all  cases  examined,  we  find  that  use  of  a 
rectangular  window  (i.e.,  no  window  at  all)  yields  the  small¬ 
est  normalized  mean  square  error.  The  paper  is  summarized 
and  concluded  in  Sec.  4. 
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Figure  1.  The  conceptual  block-diagram  of 
second-order  Volterra  model  in  the  discrete 
frequency  domain 


2  Frequency-domain  Volterra  model 


The  frequency-domain  second-order  Volterra  model  at 
discrete  frequency  m  (0  <  m  <  iV/2)  is  as  follows. 

VmM  =  HL[m]X[m]  (1) 

-I-  '^HQ[p,q]X\p]X[q]SN[p  +  q-m] 

where  X[m]  is  the  DFT  of  the  input,  YM[m]  is  the  DFT 
of  the  model  output,  HL[m]  is  the  linear  transfer  function, 
Hqlpy  q]  is  the  quadratic  transfer  function. 


A  block  diagram  of  second-order  Volterra  model  in  the 
frequency-domain  is  shown  in  Fig.  1.  Since  this  model  of 
(1)  is  linear  in  terms  of  the  kernels  Hi  and  Hq,  it  can  be 
represented  in  vector  multiplication  form 


YmH  =  [HL[m]  H^] 


■  X[m]  * 


(2) 


where  Hg ,  Xg  are  vectors  made  up  of  Hq  [p,  q] ,  X[p]X[q], 
and  p  -b  g  =  m.  Minimization  of  the  mean  square  error 
E[e^[m]]  in  Fig.  1  leads  to  the  following  normal  equation, 

[  E[YT[m]X*  H]  E[YT[m]X*g]  ] 

=  [Hdm]  H5]x 

•  E[X[m]X*[m]]  i;[X[m]X^]  1 
.  E[XQX*[m]]  E[XQX*g]  J 


where  E[']  denotes  expectation.  So  we  need  to  estimate  a  hi¬ 
erarchy  of  higher-order  auto-spectra  of  the  input  signal  up  to 
4th  order,  and  cross-spectra  of  the  input  and  output  signals 
up  to  3rd  order  .  Clearly,  the  goodness  of  the  estimation  of 
the  transfer  function  is  directly  related  to  the  goodness  of  the 
polyspectral  estimators. 


The  use  of  data  window  attenuates  the  original  signal  and 
that  results  in  bias  of  the  polyspectral  estimators.  The  fac¬ 
tor  Wp  =  [^/NJ2n  is  well  known  as  the  window 

correction  factor  (w.c.f.)  for  power  spectral  estimation.  The 
generalized  window  correction  factors[2]  for  higher-order 


spectral  estimations  are  as  follows,  where  Wt  is  the  w.c.f. 
for  bispectral  estimation  and  Wt  is  the  w.c.f.  for  4th-order 
moment  spectral  estimation. 

^0  =  (4) 

n 

(5) 

n 

Multiplying  the  windowed  polyspectral  estimations  by  these 
factors  will  give  bias-compensated  polyspectral  estimations, 
in  the  sense  that  the  corresponding  moments  (mean  square, 
mean  cube,  mean  quartic  vdues)  will  be  conserved  (approx¬ 
imately)  between  time  and  frequency  domains. 


3  Simulation  results 


The  simulations  are  carried  out  for  two  known  second- 
order  nonlinear  systems  using  the  direct  class  of  higher- 
order  spectral  estimation.  The  transfer  function  estimation 
method  is  valid  for  general  input  statistics,  so  we  used  expo¬ 
nentially  distributed  and  normally  distributed  signals.  And 
for  comparison,  we  used  various  conventional  windows[l] 
versus  no  window  (or  effectively  a  rectangular  window). 
For  the  goodness  criteria  of  the  transfer  function  estimators, 
we  use  the  normalized  mean  square  error  (NMSE)  of  the 
quadratic  transfer  function  coefficients  defined  as  follows. 


NMSEiHQ)  =  j^Y^ 


iHqtjp,  q)-HQe{p,q)f 

\HQt{p,q)f 


(6) 


Here  HQt{-)  and  Hqei  )  are  the  true  value  and  estimated 
value  of  Hq{-),  respectively.  The  index  pair  (p,  q)  ranges 
over  all  relevant  values  of  discrete  frequency  pairs  in  the 
two-dimensional  frequency  plane  where  Hq  is  defined,  and 
K  is  the  total  number  of  relevant  pairs  (p,  q).  Because  of 
space  limitations  we  focus  in  this  paper  on  the  relationship 
between  data  windows  and  quadratic  transfer  function  esti¬ 
mation. 


3.1.  System  1 

The  first  second-order  nonlinear  system  model[6]  in  the 
time  domain  has  the  following  representation. 

j/[n]  =  — 0.64a:[n]  -|-  x[n  —  2]  -f-  0.9a;^[n]  -1-  x'^[n  —  1]  (7) 

The  frequency  domain  transfer  functions  are  as  follows. 

HlM  =  -0.64 (8) 
HQ\p,q]  =  +  (9) 
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Exponentia 

distribution 

Gaussian  distribution 

with  w.c.f. 

w/o  w.c.f. 

with  w.c.f. 

w/o  w.c.f. 

Rectangular 

0.0244 

0.0244 

0.0144 

0.0144 

Hamming 

0.0673 

0.0976 

0.0167 

0.0418 

Triangular 

0.0651 

0.1354 

0.0185 

0.0849 

Hanning 

0.1102 

0.1518 

0.0222 

0.0474 

Table  1.  Normalized  mean  square  error  of  the  quadratic  transfer  function  estimates  of  the  system  1 


Figure  2.  True  quadratic  transfer  function  of 
the  system  1. 

Here  N  represents  the  FFT  length,  and  the  frequency  indices 
m,p,q  range  from  —N/2  to  N/2  —  1.  The  frequency  axes  in 
the  figures  are  normalized  to  the  FFT  length  N. 

The  quadratic  transfer  function  shown  in  Fig.  2  is  strong 
in  low  frequency  region  and  weak  in  high  frequency  region. 
Note  the  values  of  the  quadratic  transfer  function  do  not  ex¬ 
ceed  2  (actually  the  maximum  is  1.9  from  Eq.  9),  We  are 
going  to  compare  the  estimated  transfer  functions  with  this 
figure. 

Fig.  3  shows  the  cross-bispectral  estimate  Syxx  of  input 
and  output  of  the  system  1  with  a  rectangular  window  and 
exponential  input.  Note  the  strong  peak  line  along  —45^ 
which  has  the  following  representation; 

Sy..\p,-p]  =  W0]-X*[p].X*[-p]] 

=  ^[y[o].x*b].x[p]] 

=  W]-I^[p]|']  (10) 

Due  to  the  strong  low  frequency  quadratic  transfer  func¬ 
tion,  the  output  dc  value  is  large.  Furthermore,  the  inputs 
used  are  white  (i.e.,  not  band  limited).  Thus  the  resulting 
Syxx  \Pi  ~p]y  which  is  roughly  the  input  signal  power  times 
the  output  dc  value,  becomes  large.  If  we  employ  a  con¬ 
ventional  data  window,  there  corresponds  a  widening  of  the 
main-lobe.  Since  the  peak  line  is  relatively  sharp  compared 
to  the  neighboring  strip,  the  widening  of  main-lobe  can  con¬ 
taminate  the  transfer  function  estimation. 


Figure  3.  Cross  bispectrum  Syxx  of  system  1 
for  exponentially  distributed  input. 

Fig.  4  and  Fig.  5  show  the  quadratic  transfer  function 
estimate  for  the  exponential  input  case  with  a  rectangular 
window  and  a  Hamming  window  with  w.c.f.,  respectively. 
Fig.  6  and  Fig.  7  show  the  quadratic  transfer  function  esti¬ 
mate  for  the  Gaussian  input  case  with  rectangular  window 
and  Hamming  window  with  w.c.f.,  respectively.  Note  that 
the  estimates  with  rectangular  window  are  smoother  than 
those  with  Hamming  window.  Also  note  that  the  estimates 
with  Hamming  window  have  a  strong  discontinuity  along  a 
strip  centered  along  the  —45®  line,  which  is  mainly  due  to 
the  widening  of  main-lobe. 

The  normalized  mean  square  error  (NMSE)  of  the 
quadratic  transfer  function  estimates  for  both  exponential 
and  normal  (Gaussian)  distribution  inputs  over  various  data 
windows  are  presented  in  Table  1 .  The  table  also  shows  how 
the  NMSE  varies  with  whether  we  employ  a  window  correc¬ 
tion  factor  (w.c.f.)  or  not.  Note  that  the  w.c.f.  for  the  rect¬ 
angular  window  is  1,  so  the  NMSE  for  a  rectangular  win¬ 
dow  with  w.c.f.  and  without  w.c.f.  are  the  same.  We  can  ob¬ 
serve  that  in  all  cases  the  window  correction  factor  reduces 
the  NMSE,  which  indicates  that  the  w.c.f.  can  compensate 
the  bias  of  polyspectral  estimation.  Furthermore,  note  that 
the  rectangular  window  gives  the  smallest  normalized  mean 
square  error  among  the  conventional  data  windows. 

Let  us  compare  the  NMSE  of  a  rectangular  window  and 
Hanuning  window  with  w.c.f.  The  NMSE  of  Hamming 
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Figure  4.  Quadratic  transfer  function  estimate 
of  the  first  system  using  Rectangular  window. 
The  input  signal  is  exponentially  distributed. 


Figure  5.  Quadratic  transfer  function  estimate 
of  the  first  system  using  Hamming  window 
and  appropriate  w.c.f.  The  input  signal  is  ex¬ 
ponentially  distributed. 


window  with  w.c.f.  for  exponentially  distributed  input  is 
slightly  less  than  3  times  that  of  rectangular  window  for  the 
same  input.  The  difference  is  visible  when  comparing  Fig.  4 
and  Fig.  5.  Note  the  scale  difference  in  z-axis  of  two  figures. 
The  estimated  quadratic  transfer  function  values  using  rect¬ 
angular  window  in  Fig.  4  do  not  exceed  2.5,  whereas  those 
values  using  Hamming  window  with  w.c.f.  have  a  maxi¬ 
mum  between  3  and  4  which  is  quite  large  compared  to  the 
true  maximum  value  1 .9  (Eq.  9)  in  Fig.  2.  Next  consider  the 
Gaussian  input  case.  Although  the  NMSE  of  Hamming  win¬ 
dow  with  w.c.f.  is  almost  the  same  as  that  of  the  rectangular 
window,  we  observe,  when  comparing  Fig.  6  and  Fig.  7,  that 
their  appearances  are  quite  different.  Most  discrepancies  are 
concentrated  around  the  -45  degree  line.  So  in  this  case  the 
NMSE  difference  is  not  adequate  to  show  the  difference  in 
estimation  quality. 


0.5  ^.5 

f1 


Figure  6.  Quadratic  transfer  function  estimate 
of  the  first  system  using  Rectangular  window. 
The  input  signal  is  normally  distributed. 


Figure  7.  Quadratic  transfer  function  estimate 
of  the  first  system  using  Hamming  window 
and  appropriate  w.c.f.  The  input  signal  is  nor¬ 
mally  distributed. 


3.2.  System  2 

The  next  second-order  nonlinear  system  model  in  the 
time  domain  has  the  following  representation. 

y\n\  =  -0.7a;[u]  -  0.9a;[n  -  2]  -f  0.8a:^[n]  -  Q.hx'^[n  -  2] 
The  frequency  domain  transfer  functions  are  as  follows. 

=  -0.7  -  (12) 

HQ\p,q]  =  0.8  -  (13) 

The  quadratic  transfer  function  is  shown  in  Fig.  8.  We  ob¬ 
serve  that  this  quadratic  transfer  function  is  strong  in  mid¬ 
frequency  range  and  weak  in  the  low  and  high  frequency 
range. 

The  normalized  mean  square  error  (NMSE)  of  the 
quadratic  transfer  function  estimates  for  both  exponential 
and  normal  (Gaussian)  distribution  inputs  over  various  data 
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Exponential  distribution 

Gaussian  distribution 

with  w.c.f. 

w/o  w.c.f. 

with  w.c.f. 

w/o  w.c.f. 

Rectangular 

0.0138 

0.0138 

0.0043 

0.0043 

Hamming 

0.0646 

0.0919 

0.0150 

0.0391 

Triangular 

0.0636 

0.1296 

0.0172 

0.0822 

Hanning 

0.1001 

0.1363 

0.0190 

0.0482 

Table  2.  Normalized  mean  square  error  of  the  quadratic  transfer  function  estimates  of  the  system  2. 


Figure  8.  True  quadratic  transfer  function  of 
the  system  2. 

windows  are  presented  in  Table  2.  The  table  also  shows  how 
the  NMSE  depends  on  whether  we  employ  window  correc¬ 
tion  factor  (w.c.f.)  or  not.  We  can  observe  that  the  window 
correction  factor  reduces  the  NMSE,  and  that  the  rectangu¬ 
lar  window  gives  the  smallest  normalized  mean  square  er¬ 
ror  among  the  conventional  data  windows,  with  and  without 
w.c.f. ’s. 

4.  Conclusions 

In  this  paper,  it  was  shown  that  the  use  of  a  rectangu¬ 
lar  window  (v.s.  tapered  windows)  yields  the  best  esti¬ 
mates  of  Volterra  quadratic  transfer  functions.  Since  this  re¬ 
sult  is  based  on  two  simulation  experiments,  it  obviously 
is  not  general.  Nevertheless,  it  is  consistent  with  empiri¬ 
cal  observations  of  those  modeling  nonlinear  physical  sys¬ 
tems.  Space  has  not  permitted  a  discussion  of  linear  trans¬ 
fer  function  results.  Suffice  it  to  say  that  both  rectangular 
and  conventional  windows  give  comparable  results.  Also 
the  w.c.f. ’s  cancel  out  in  linear  transfer  function  estimation. 
However,  this  is  not  the  case  for  quadratic  transfer  function 
estimation.  Space  limitations  have  also  precluded  a  discus¬ 
sion  of  the  role  of  the  extended  principal  domain[7]  in  the 
quadratic  transfer  function  estimation.  These  issues  will  be 
discussed  in  a  subsequent  full  length  paper. 
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ABSTRACT 

The  focus  of  this  paper  is  on  Volterra  nonlinear  system 
identification  from  input-output  data.  When  the  system 
is  linear-quadratic  and  the  input  is  Gaussian,  closed-form 
expressions  for  the  kernels  were  derived  by  Tick  based  on 
input-output  cross-cumulants.  However,  there  have  been 
no  known  variance  expressions  for  the  kernel  estimates.  In 
this  paper,  we  analyze  the  performance  of  the  first-  and 
second-order  kernel  estimates  when  the  input  is  zero-mean 
white  Gaussian,  and  the  additive  noise  has  unknown  color 
and  distribution.  Closed-form  variance  expressions  are  pre¬ 
sented  and  verified  by  simulations. 


1.  INTRODUCTION 

For  physical  systems  exhibiting  mild  nonlinearities,  Volterra 
series  modeling  has  been  demonstrated  to  be  a  viable  ap¬ 
proach  (see  e.g.,  [1],  [3],  [6]).  A  discrete-time  Volterra  series 
relates  the  input  w{n)  and  the  output  x{n)  via 


2.  IDENTIFICATION  OF  LINEAR  AND 
QUADRATIC  KERNELS 

A  linear-quadratic  system  (i.e.,  a  second-order  Volterra  sys¬ 
tem)  is  described  by 

®(«)  =  Hui  hi{ui)w{n  -  Ui) 

+EE  h2{ui^U2)w{n  —  ui)w{n  —  U2)  +  v{n)j  (2) 

■Ul  U2 

where  w{n)  is  the  input,  x{n)  is  the  output,  and  v{n)  is 
the  additive  noise.  In  this  paper,  we  assume  that  w{n) 
is  zero-mean,  i.i.d.  Gaussian  with  variance  cr^y,^  and  v(ti) 
is  zero-mean,  with  unknown  color  and  distribution,  and  is 
independent  of  u;(n). 

The  cumulant  is  the  main  tool  used  in  this  paper  to  de- 
rive  kernel  and  variance  expressions.  We  refer  the  reader  to 
Section  2.3  of  [2]  for  the  definition  and  properties  of  cumu- 
lants. 


x{n)  =  ES'E  hk{,ui,...,Uk) 

k=l  Ul  Uk 

k 

X  JJ  Ly(n  - 'Um),  n  =  0, . . . ,  iV  —  1,  (1) 

m=l 

where  hk{uu...,Uk)  is  called  the  A;^^-order  kernel.  In  a 
practical  setting,  this  series  is  often  truncated  in  kernel  or¬ 
der  and  kernel  length.  Sandburg  has  shown  that  the  *  dou¬ 
bly  finite”  Volterra  series  can  uniformly  approximate  a  large 
class  of  nonlinear  systems  [10].  Without  loss  of  general¬ 
ity,  we  assume  that  ,  Uk)  is  symmetric  in  its  argu¬ 

ments,  because  we  can  symmetrize  the  kernel  if  otherwise 
[11]- 

A  common  problem  encountered  in  Volterra  modeling  is 
to  estimate  the  kernels  from  input-output  data  and  several 
algorithms  have  been  developed  [4],  [5],  [7],  [8],  [12].  Higher- 
order  statistics  (cross-correlations,  cumulants,  or  polyspec¬ 
tra)  are  natural  with  nonlinearities  and  are  essential  tools 
in  deriving  performance  analysis  results.  To  the  best  of  the 
authors’  knowledge,  there  have  been  no  known  closed-form 
variance  expressions  for  the  kernel  estimates  —  the  primary 
reason  of  which  seems  to  be  the  complexity  of  analysis. 

In  this  paper,  we  focus  on  linear- quadratic  systems  and 
white  Gaussian  inputs.  We  first  review  Tick’s  method  [12] 
of  estimating  the  kernels  from  input-output  data  and  then 
show  their  variance  expressions.  The  latter  not  only  allow 
us  to  quantify  the  relative  contribution  from  various  system, 
input,  and  noise  parameters  but  also  establish  benchmark 
performance  measures  to  be  compared  with  alternative  al¬ 
gorithms  such  as  in  [8],  [13]. 


2.1.  Identification  of  the  First-  Order  Kernel 

The  first-order  kernel  coefficients  are  found  by  computing 
the  cross-cumulant  between  x{n)  and  w{n)  [12].  Using  the 
multilinearity  property  of  cumulants  (see  p.l9  in  [2]),  we 
find. 


Cxw{‘Tl^  — 


cum{a:  (n)  ,u;(n  —  ri)}  (3) 

y^hi(m)cum{u;(n  —  ui)yW{n  —  n)}  (4) 

Ul 

h2{ui^U2) 

Ul  U2 

xcum{u;(n  —  ui)w{n  —  U2)yW{n  —  ti)}(5) 
-|-cum{u(n),  w{n  —  n)}  (6) 

^^hi('Ui)  cum{u;(n  —  ui),w{n  —  n)}  (7) 

Ul 


^hi(tii)  oIj  S{ui  -Ti) 

Ul 

(8) 

ol  /ii(ri). 

(9) 

rhe  cumulant  in  (5)  is  zero  because  w{n)  is  zero-mean 
Glaussian,  whereas  the  cumulant  in  (6)  is  zero  because  w{n) 
md  v{n)  are  mutually  independent  and  zero-mean.  There- 
:ore,  /ii(ri)  is  expressed  in  terms  of  the  input-output  cross- 
mmulant  as 

(10) 


O'  IP 
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2.2.  Identification  of  the  Second-Order  Kernel 
The  second-order  kernel  coefficients  are  found  from  the 
cross-cumulant  between  x{n)  and  two  copies  of  w{n)  [12]: 

Cxww(tijT2)  =  cum{x(n),u;(n  -  Ti)^w{n  —  T2)} 

=  ^  /ii(ui)cum{tij(n  -  ui)^w{n  -  Ti),w{n  -  T2)}  (11) 

ui 

EE  h2{ui ,  U2)  cum{u;(n  —  ui)w{n  —  'W2), 


w{n  —  Ti)^w{n  —  r2)}  (12) 

-hcum{t)(n),iy(n  -  Ti),ti;(n  —  r2)}  (13) 

-  EE  h2{ui,U2)cnm{w{n  —  ui)w{n  —  7/2), 

iz;(n  -  ri),u;(n  -  r2)}  (14) 

=  ^^/i2(wi,W2)  [crt^{ui  -  ti)6{u2  -  r2) 

ui  U2 

+  (Tis{ui  -  r2)S{u2  “  Ti)]  (15) 
=  crt  /i2(ri,r2)  h2(r2,ri)  (16) 

=  h2{ri,r2).  (17) 


The  Leonov-Shiryaev  formula  [2,  p.  10]  is  used  to  obtain 
(15)  from  (14).  We  have  also  used  the  kernel  symmetry 
assumption;  i.e.,  712(^1, r2)  =  h2(r2,ri),  to  simplify  (16) 
to  (17).  Therefore,  h2(ri,T2)  is  expressed  in  terms  of  the 
input-output  cross-cumulant  as  follows: 


h,in,T2)  =  (18) 


n=0  ui  U2 

xw{n  —  ui)w{n  —  U2)w{n  —  n)  (22) 


+ 


N-l 

v{7i)w{n 


n). 


(23) 


n=0 

The  variance  expression  of  the  above  linear  kernel  esti¬ 
mate  can  be  shown  to  contain  eight  terms  [9]: 


var{hi(ri)}  —  ^y^^i('Wi)  +  -'^1) 

Ul  til 

+  ^  IZ «2)  +  ^ 

il  U2 

-  2 

y^/t2(m,ri) 

y^y^Mm!m)^2(M2,ri) 

Ul  U2 

XI  )  n  +  Ml  -  M2) 

Ul  ii2 

^2 

Xh2{U2,Ti+U2-Ul)  +  j^.  (24) 

3.2.  The  Second- Order  Kernel 

The  second-order  cross-cumulant  is  estimated  from 


4^ 

N 


y^h2(m,^i) 

L  Ul 


3.  PERFORMANCE  ANALYSIS  OF  KERNEL 
ESTIMATES 

In  practice,  we  estimate  the  kernels  from  N  samples  of 
input-output  data.  It  is  important  to  know  the  performance 
of  these  estimates  under  various  input  and  system  condi¬ 
tions.  Closed-form  variance  expressions  will  also  enable  us 
to  infer  the  data  length  necessary  to  meet  certain  (low) 
kernel  variance  requirements.  Due  to  space  limitations,  we 
present  here  only  the  final  variance  expressions  of  the  ker¬ 
nel  estimates;  detailed  derivations  can  be  found  in  [9].  For 
tract  ability  of  analysis,  we  assume  that  cr^  is  known  -  this 
is  the  case  when  we  have  control  of  the  input;  otherwise,  we 
estimate  from  w^{n). 

3.1.  The  First-Order  Kernel 

The  cross-cumulant  used  in  calculating  the  first-order  kernel 
is  estimated  from  a  finite  amount  of  input-output  data  as 

Cxwin)  =  —  x{n)w{n  -  n).  (19) 

n=0 

Inserting  (19)  into  (10),  we  find 


f  /  \  Cxwi'^l^  1 


x(n)7i;(n  —  n).  (20) 


Now  substitute  (2)  into  (20)  to  obtain 

'^Y^hi{ui)w{n  -  ui)w{n  -  Ti)  (21) 


n=0  Ul 


Cxww{ti,T2)  =  ^  y^(x(n)-mx)u;(n-ri)tt;(n-'r2),  (25) 

n~0 

where  rhx  =  N~^  Sn=o^  x(n).  To  simplify  analysis,  we 
substitute  rhx  by 


rrix  =  E[x{n)]  =  'y^  h2{u,  u)al.  (26) 


The  resulting  variance  will  be  slightly  larger  than  the  ac¬ 
tual  variance  with  rhx  but  our  simulations  show  that  the 
difference  is  small. 

Substituting  (25)  into  (18),  we  find 


^2('7“1,T2)  —  2^^  Cxww{'^1jT2) 


|_n=0  u 

X  w{n  —  u)w{n  —  ri)u;(n  —  r2) 


N-l 

+EEE  /l2('Ui,  tt2) 

n=0  Ul  U2 


(27) 


xw{n  —  ui)w{n  —  U2)w{n  —  ri)u;(n  —  r2)  (28) 


N-l 

+  v{n)w{n  —  Ti)w{n  —  T2) 

n~0 

N-l 

-EE  h2(^^,^t)c^^^y(n  —  ri)7i;(n  -  r2) 

n— 0  u 


(29) 

(30) 
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The  variance  expression  of  the  above  second-order  kernel 
estimate  can  be  shown  to  contain  twenty-eight  terms  [9]: 

var{h2(ri,r2)}  = 

U 

+  2jVff^  fti(2n  '  n}ftii2n-Ti) 

1 


(5(ri  -T2) 


4Nal 

+j^y^^hi{u)hi{Ti)5{Ti  -T2) 

^  U 

+  ^  £  Z)  ^2  (lil ,  «2)  +  |:  52  ^*2  (“1  ’ 


Ul  W2 


N 


y~"fe|(m,T2) 


+^^ft2(wi,n)/i2(ri,2r2  -ui) 

Ul 

^/i2(«i,'r2)h2(2ri  -■ui,r2) 

Ul 

’’’ivZ  h2(ni,n)/i2(wi  +ri  —  r2,r2) 

+  ~  h2{UijT2)h2{ui  -^T2  —  Ti,ri) 

Ul 

+  ^  ^h2('Ui,ri)h2(ri  +r2  -  m,r2) 

Ul 

+  ^  ^/i2(?/i,r2)h2(ri  +r2  -ui.n) 

Ul 

+  ^  h2('iti,  2ri  r2)h2{ui  -f  r2  “  n,  2r2  —  n) 

Ul 

-1-^  /i2(«i,  2r2  —  ri)h2(iii  +  n  —  T2,2ri  —  r2) 

Ul 

^  ^2 (-lii ,  til  +  T2  -  ri) 


Ul 


x/i2(2ri  —  tfcijTi  +  r2  —  ifci) 


Ul 


x/i2(2r2  —  Ul,  ri  -h  r2  “  ui) 

+5^EE  h2(ui,U2)/l2(ul  +T2-  n,U2  4*  r2  -Ti) 


Ul  U2 


/i2(ri,U2)h2(ui,ui  -  U2  +Ti)5(ri  -  T2) 


Ul  U2 


+ri?EEE  h2(ui,U2)/l2(ui  —  U2  +  Ul,  IJl) 


Ul  U2  Ul 

xJ(ri  -  T2) 


^h2(u,u) 


1 

‘4  AT 


T  2 


y^/l2(ui,Ul) 


^(n  -  r2). 


(31) 


3.3.  Observations 

We  make  the  following  observations  regarding  the  variance 
expressions  (24)  and  (31). 

1.  Since  var{/ii(ri)}  and  var{h2(ri,r2)}  are  0(A/'”^),  the 
kernel  estimates  are  consistent. 

2.  We  make  no  assumption  on  the  color  or  distribution  of 
the  additive  noise. 

3.  Increasing  the  additive  noise  variance  cr^  increases  the 
variance  of  both  the  first  and  second-order  kernels. 

4.  The  variance  of  the  input  sequence  crj  has  significant 
contributions  to  the  variance  expressions.  Interest¬ 
ingly,  it  affects  the  first  and  second-order  kernels  in 
an  opposite  fashion:  the  variance  of  hi  (n )  tends  to  in¬ 
crease  while  the  variance  of  h2(Ti ,  T2)  tends  to  decrease 
with  increasing 

5.  The  variance  of  h2(n,Ti)  contains  additional  terms  as 
compared  to  that  with  ti  ^  T2.  Therefore,  diagonal 
kernel  estimates  tend  to  have  larger  variance. 

6.  Even  for  n  such  that  /ii(ri)  =  0,  we  find  var{hi(ri)}  ^ 

0  due  to  n -independent  terms  such  as  N~^  ^ui  ^i('“i) 
in  (24).  Similarly,  even  for  (Ti,r2)  with  h2(Ti,'r2)  =  0, 
we  will  have  var{h2(ri,r2)}  ^  0  due  to  constant  terms 
such  as  {ANal)-^  Eu  '^?(^)  (^l)' 

4.  SIMULATIONS 

The  linear-quadratic  system  is  described  as  in  (2)  with  the 
following  parameters:  hi(0)  =  0.8,  hi(l)  =  1-2,  hi  (2)  =  0.4, 
/i2(0,0)  =  1,  h2(0,l)  =  h2(l,0)  =  0.5,  and  h2(l,l)  =  0.3. 
The  input  w{n)  is  zero-mean,  white  Gaussian  with  variance 
cr^  =  1.  Additive  noise  v{n)  is  zero-mean,  white  Gaussian 
noise  with  variance  =  0  or  =  0.1.  We  first  formed 
hi(ri)  and  h2(ri,T2)  from  N  samples  of  input-output  data 
and  then  obtained  the  empirical  variances  of  hi(ri)  and 

h2(Ti,T2)  based  on  500  independent  realizations.  In  Figs.  1- 
4,  we  display  the  empirical  (solid  line)  and  theoretical  (dot¬ 
ted  line)  iVvar{hi(ri)}  (Figs.  1  and  2)  and  iVvar{h2(Ti,r2)} 
(Figs.  3  and  4)  -  they  are  seen  to  agree  fairly  well. 

5.  CONCLUSIONS 

Analytical  variance  expressions  of  the  kernel  estimates  are 
obtained  for  linear-quadratic  systems  with  white  Gaussian 
input.  The  estimates  are  shown  to  be  consistent.  The  vari¬ 
ance  expressions  describe  explicit  dependence  on  the  data 
length,  input  variance,  additive  noise  variance,  and  true 
kernel  values.  Our  simulation  results  agree  well  with  those 
predicted  by  the  theory. 
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Figure  1.  First  order  kernel  estimates,  al  =  0. 
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Figure  2.  First  order  kernel  estimates,  ar^  =0.1. 
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165 


Stochastic  Resonance  in  a  Discrete  Time  Nonlinear  SETAR(1,2,0,0)  Model 


Steeve  Zozor  and  Pierre-Olivier  Amblard 
CEPHAG-ENSIEG  UPRESA  CNRS  5086 
B.P.  46,  38402  Saint  Martin  d’Heres  Cedex 
Steeve.Zozor@cephag.inpg.fr  Bidou.Amblard@cephag.inpg.fr 


Abstract 

We  present  in  this  paper  the  stochastic  resonance  phe¬ 
nomenon  in  a  discrete  time  context.  Indeed y  stochastic 
resonance  has  been  commonly  investigated  in  continuous¬ 
time.  Analytical  results  given  by  a  simple  bistable  nonlin¬ 
ear  SETAR[l,  2,  0,  0)  are  studied,  Then^  the  ability  of  such 
a  system  to  be  used  in  signal  processing  is  discussed. 


1.  Introduction 

Stochastic  Resonance  (SR)  is  a  physical  phenomenon 
occurring  in  bistable  dynamic  systems  excited  by  random 
noise  and  a  pure  frequency.  Suppose  that  a  particle  is  mov¬ 
ing  in  the  bistable  potential  U (a:)  pictured  in  Figure  1 .  Un¬ 
der  the  assumption  of  strong  friction,  the  equation  of  motion 
reduces  to  (x).  In  that  case,  the  particle  will 

fall  and  stay  in  one  well  of  the  potential.  If  a  random  noise 
is  added,  x{t)  =  -VxU[x)  -h  n[i),  the  particle  will  have 
a  non-zero  probability  to  jump  from  one  well  to  the  other. 
When  thresholded,  the  output  signal  x{i)  will  be  roughly  a 
telegraph  signal.  Now,  if  we  add  a  sinusoid  in  the  input,  the 
potential  will  be  modulated,  and  provided  a  fine  tune  of  the 
parameters  (sine  and  noise  amplitudes),  a  cooperative  effect 
between  the  sinusoid  and  the  noise  takes  place:  the  particle 
will  jump  from  one  well  to  the  other  at  the  frequency  of 
the  sine.  The  interesting  fact  is  that  the  output  Signal-to- 
Noise  Ratio  (SNR)  at  the  frequency  of  the  sinusoid  presents 
a  maximum  when  plotted  against  the  variance  of  the  input 
noise!  (see  Figures  3  and  5). 

This  fact  has  lead  several  researchers  to  examine  the 
ability  of  SR  to  detect  small  amplitude  periodical  signal 
[3,  4,  5].  This  motivated  this  study.  The  theory  of  SR  in 
continuous  time  systems  is  difficult.  However,  under  some 
hypothesis,  several  approximate  theories  exist  that  explain 
that  phenomenon  [2].  In  this  paper,  we  examine  SR  in  a  dis¬ 
crete  time  context,  and  this  is  the  first  attempt  to  do  so.  The 
model  we  use  is  a  nonlinear  SETAR(1 ,2,0,0)  model  [7].  We 
consider  two  systems,  which  read 


(  SETAR(1,2,0,0) 

:  Xn  = 

— l)  "1" 

<  output  of  system  1 

;  — 

Xn 

[  output  of  system  2 

:  Vn  = 

^[Xn) 

where  $(a:)  =  csign(x) 

and  ^(a;)  = 

V  sign(a;).  The  input 

iscn  =  6„-he:sin(27rnAo+^o)  where  is  a  Gaussian  zero- 
mean  white  noise  of  variance  cr^.  £  is  supposed  to  be  small 
compared  to  and  e  <c  (with  only  the  modulation,  system 

1  and  system  2  are  not  able  to  jump  from  dbc  to  =Fc).  System 

2  can  be  viewed  as  a  discrete  Schmitt  Trigger,  known  in 
continuous  time  to  provide  SR  [1].  In  the  following  section, 
we  give  the  theoretical  study  of  those  systems,  which  are 
difficult  because  of  the  presence  of  the  sinusoid.  Section  3 
presents  a  discussion  on  the  results  and  on  the  use  of  SR  in 
Signal  Processing. 


2.  Theoretical  study  of  the  systems 
2.1.  System  1 

The  output  of  the  SETAR(1,2,0,0)  is  Markovian,  thus, 
we  can  use  the  Chapmann-Kolmogorov  equation  to  evaluate 
the  probability  density  function  (pdf)  of  Xn,  fn  (a^), 

fn{x)=  f  Mx  -  ^y)  -  Sn)  fn-l{y)  dy  (1) 

m 

where  ft  denotes  the  pdf  of  the  noise  bn  and  where  £„  = 
e  sin(27rnAo  +  ^o)  represents  the  sine.  Denoting  an  as 

^+oo 

I  da:  (2) 

/o 

the  probability  function  can  be  expressed  as 

fn{x)  =  {1  -  Qn-l)  fb{x  +  C-  Sn) 

—  X  fbi,^  ^  ) 

Now  integrating  (3)  over  [0  ;  +oo[  leads  to 

f—C  -\-  Sn  . 


p+oo 

an  —  /  fn  (^)  ^ 

Jo 


(3) 


O^n  —  ^n  —  1  "i"  " 


(4) 


0-8186-8005-9/97  $10.00  ©  1997  IEEE 


166 


where  A'„  =  erf(2^)  -  erf(^^^)  and  erf(u)  = 

I-oo  Under  the  assumption  e  <c  a, 

gj^(±c+e„)  ^  erf(^)  +  fb(c)e„,  then  a„  verifies  the  fol¬ 
lowing  equation 

a„  =  Kon-i  +  +  Lsn  (5) 

where  K  =  erf(^)  -  erf(^)  and  A  =  fb(c).  (5)  can  easily 
be  studied,  and  it  can  be  shown  that  Qf„  is  asymptotically 
expressed  as 

where  a  -  1  -  and  where  in-e  sin(27rnAo  + 

^0  -  Arg(a)). 

Equation  (6)  shows  that  signal  Xn  is  almost  cyclostation¬ 
ary,  and  that  system  1  improved  a  beating  phenomenon:  the 
weights  and  1  —  of  ±  c  —  ^n)  pulse.  This  phe¬ 
nomenon  is  coupled  to  a  moving  mean  phenomenon:  the 
locations  of  fi^{x  ±  c  —  Sn)  periodically  move. 

The  autocorrelation  function  of  Xn,  T{n,q)  = 
E[xnXn-\.q],  is  then  expressed  as 

9)  =  ^nXn+qf{^n,Xn+g)  dar„  dXn+g  (7) 

where  f{xn,Xn+q)  denotes  the  joint  probability  density 
function  of  (a;„,a:„+,). 

Without  additive  modulation  (s  =  0),  signal  is  sta¬ 
tionary  and  the  correlation  function  is  denoted  ro(g).  Con¬ 
sidering  that  g  >  1,  ro(g  -I- 1)  is  given  by 

r'o(9  +  l)  =  J^^^nX„+q+if{x„,Xn+q+l)dXn+g+ldx„ 

where  we  have  used  the  Chapmann-Kolmogorov  equation 
within  the  integral.  Continuing  the  recursion  we  get 

+  1)  =  ^nf{Xn,Xn+q-l)  «>(s;„+,) 

fbi^Cn+q  —  ^{Xn-\-q~l))dXn^q  dXn+q-\  dXn 


=  //»c[erf(-^"+^  -  erf(--^"+^~^^)] 

J  M.  (T 

f{xnj  Xnj^q_i)  da:„_|.g_i  da:„ 

Using  the  following  equality 

c[erf(-i-J^^-^)  _  erf(--^”+g-^^)]  =  csign(a;„+g_i)A 
we  get 

ro(g  +  l)  =  K  j^Xn^Xn+q-l)f{Xn,Xn+q-l)dXn+q-ldXn 

(9) 


Using  (8)  and  (9)  it  can  be  concluded  that  for  g  >  1  the 
correlation  function  is  recursively  expressed  as 

ro(g+l)  =  Aro(g)  (10) 

We  evaluate  ro(l)  =  Kc^  -|-  2cL(t^  and  ro(0)  =  -f 
<r^.  The  autocorrelation  function  of  x„  in  the  absence  of 
modulation  is  then  given  by 

r  ro(0)=:c2-|-<r2 

1  (11) 

1.  ToC?)  =  (A'.c^  -I-  2cL(t^).A'^~^  for  g  >  1 

For  £  ^  0,  a!n  is  almost  cyclostationary,  and  using  the  same 
way  (which  is  a  quite  more  complicate  in  this  case)  we  show 
that  the  autocorrelation  function  of  x„  is  expressed  as 

r(n,g)  =  To  (g) +//„+,//„  (12) 

where 


fin  =  Ms  sm{2TrnXo  +  <po  —  ipi) 
M2  =  |a-|-i6|2 


^  (pi  =  Aig{a  +  ib)  (j3) 

^  —  —  ^jTsin(2;rAo) 

,  ^  =  1  +  ^p(cos(2;rAo)  —  A) 

It  can  be  noticed  that  A[a;„]  =  fi„.  To  get  a  spectrum  at 
the  zero-cyclic  frequency,  r(n,  g)  is  averaged  over  n.  This 
leads  to 


r(9)  -  ro(g)  -I-  — - — cos(27rgAo)  (14) 

Then,  the  SNR  of  the  output  is  evaluated  as  the  ratio  be¬ 
tween  the  local  power  of  the  output  sine  around  Aq  and 
the  local  power  spectral  density  of  the  output  noise  at  Aq. 
The  SNR  is  then  expressed  as  SNRo  =  where 

+  CXD 

^  is  the  Fourier  Transform  of 

q=-oo 

To{q),  the  correlation  of  the  output  noise. 

SNRo  depends  on  the  frequency  Aq,  and  on  the  variance 
of  the  input  noise  cr^.  But  SNRo  is  not  inversely  propor¬ 
tional  to  cr^  because  of  the  nonlinearity  of  the  system.  The 
interacting  effect  between  the  sine  and  the  noise  can  be  seen 
through  the  SNR,  and  also  through  the  amplification  of  the 
cosine  on  the  autocorrelation  function.  These  results 
will  be  more  discussed  in  section  3 


2.2.  System  2 

The  output  yn  of  the  second  system  is  a  two-states  sig¬ 
nal,  and  is  also  Markovian.  Then,  is  characterized  by  its 
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■  1  ■ 

L 

■  -1  ■ 

1 

+  1  rsn 
|a| 

1 

probability  vector  £(n) ,  and  its  transition  matrix  £(n)  from 
step  n  —  1  to  n.  In  our  context,  we  assume  that  s  is  small 
compared  to  a.  Then,  using  (6)  we  get 


(15) 


y„  is  also  almost  cyclostationary,  and  is  also  submitted  to  a 
beating  phenomenon.  Nevertheless  the  moving  mean  phe¬ 
nomenon  seen  in  has  been  cancelled  by  the  threshold¬ 
ing.  The  components  of  the  transition  matrix  Z(n)  are 
expressed  by  Pjj(n)  =  Pr[j/n  =  (2*  —  3)V|j,„=(2j-3)v] 
where  i,  j  €  { 1 , 2}  (we  insist  that  with  our  notation  p{n)  = 
P(n)  p(n  —  1)).  Hence, 

„  ,  ,  Prfa:„<0  and  a;„-i  <  0] 

This  leads  to 

(17) 

’  '  ’  1  -0!„_1 

The  joint  density  probability  function  of  (a;„_i,ar„)  is 
expressed  as  /(a:„_i,a;„)  =  /(a:nU„_i)/(®n-i)>  1-®- 

f{Xn-l,Xn)  =  fbi^n  “  ^(a^n-l)  —  Sn)f{^n-l)‘  Hence 

Pi,i(”)  is  expressed  as 


Pi,i(n)  =  erf(^ 


-) 


i.e. 


Pi,i(«)  ^  erf(-)  -  LSn 


Similarly 


P2,2(w)  « 


(18) 

(19) 

(20) 


Itcanbeseenthaterf(±f)  =  |(1±A').  £(«)  isastochas- 
tic  matrix,  hence  £(n)  is  expressed  as 


P(n)  =  L.D_.Sn 


where 


Po  = 


1  + A' 


1 

0  ■ 

1- A 

■  0 

1  1 

1  0 

1 

2 

1 

0 

(21) 


(22) 


D  = 


(23) 


2 

and  where 

-1  -1 
1  1 

Notice  that  ^  is  the  transition  matrix  without  modulation 
(s  =  0).  In’die  presence  of  the  modulation,  this  transition 
matrix  is  modulated. 

The  transition  matrix  from  step  n  to  step  n  -I-  g  is  then 
expressed  as  n(n,  q)  =  R{n  -I-  g)...£(n  -I-  1),  that  is  ap- 

proximatively n.(n,  q)  «  -I-  y^Po^~^~^£Po^ En+i+fe 

k=0 


Using  the  following  equations  (which  are  evident) 
PoD  =  KD 


(24) 


DPo  =  D 


it  can  be  seen  that 


9-1 


1 

0  ■ 

I- A« 

■  0 

1  ■ 

0  1 

1  _ 

2 

_  1 

0  _ 

/ 

9)  =  P^  +  ^  *  ^n+l-¥k)B.  (25) 

k=0 

Thus,  we  get 

n(n,  q)  =  Po®  +  -ni^n+g  -  KHn)R  (26) 
—  =  lal 

Using  (22)  it  can  be  seen  that 
=  2  . 

(27) 

The  autocorrelation  function  of  yn  is  defined  by 
r(n,g)  =  E[y„y„+g].  This  function  is  then  expressed 
as  r(n,g)  =  V'^.[ni,i(n, g).pi(n)  -1- 112.2(11, 9)-P2(n)  - 
ni,2(n,g)P2(n)  -  n2,i(n,9).pi(i»)]-  Using  the  previous 
results  we  get 

4  4L^ 

r(n,  q)  =  U^[A^(1  -  -j^)  +  |^4n+9^n]  (28) 

Notice  that  E[y„]  =  Again,  averaging  over  n  leads 

9T/2r2^2 

r(g)  =  V^KO  +  cos(2irgAo)  (29) 

The  correlation  function  of  the  noise  part  of  the  output  is 
then  rnoise(9)  =  1^^(1  "■  hence  5noise(A)  = 

The  SNR  of  the  output  is  then  approxi- 
matively  expressed  as 


SNRo  = 


£21,2 

1-  A2 


(30) 


SNRo  does  not  depend  of  V,  but  the  most  important  result 
is  that  SNRo  does  not  depend  on  the  frequency  Aq. 

SNRo  tends  to  0  when  <t^  tends  to  0  or  -foo.  It  yields 
that  it  exists  an  optimal  noise  variance  <T2pt  for  which  the 
SNR  presents  a  maximum.  This  fact  confirms  that  system 
2  produces  Stochastic  Resonance.  Moreover  5o(A)  tends  to 
when  <r  tends  to  the  infinity:  The  output  noise  spectrum 
is  then  constant.  This  fact  is  a  direct  consequence  of  the 
thresholding,  and  not  a  consequence  of  the  SR  phenomenon 
(unlike  the  study  of  Fauve  in  [1]  should  suggest).  Indeed 
this  fact  also  appears  without  modulation.  For  the  Schmitt 
Trigger  [1]  the  thresholding  is  natural  and  the  SR  probably 
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appears  when  the  power  spectral  density  of  the  output  noise 
is  roughly  constant.  Hence  Fauve  thought  that  it  was  as  if 
the  sine  have  “pumped”  energy  to  the  noise.  Nevertheless 
the  sine  effectively  “pump”  energy  to  the  noise,  and  this  at 
all  the  frequencies  (cf.  5noise(A)).  Furthermore  the  sum  of 
the  power  spectral  density  of  the  noise  at  Ao  and  the  local 
power  of  the  sine  is  constant  for  all  e,  and  in  particular  for 
£  =  0.  That  confirms  the  effect  of  “energy  transfer”  from 
the  noise  to  the  sine.  This  cooperative  effect  can  be  seen 
either  on  the  SNR  plotted  against  the  variance  of  the  input 
noise  or  on  the  local  power  of  the  sine  plotted  against  cr^. 
The  previous  results  will  be  more  discussed  in  the  following 
section. 


3.  Discussion 


Notice  first  that  the  SNR  of  the  output  of  system  1  tend 
to  -f  oo  when  cr  ^  0"^,  unlike  the  SNR  of  system  2  (figures 
3  and  5).  Indeed,  with  the  modulation  {e  ^  0)  and  without 
noise  (cr  =  0),  a  particle  in  a  bistable  potential  does  not 
jump  from  one  well  to  the  other,  but  periodically  moves  in 
one  well :  the  SNR  is  infinite.  Thresholding  the  output  (as 
in  the  second  system)  makes  it  constant  (  ±V),  since  the 
particle  stays  in  one  well.  It  can  also  be  concluded  that 
SNRo  of  the  second  system  is  less  than  SNRq  of  the  first 
one. 

The  maximum  of  SNR  is  then  explained  by  a  coopera- 
tive  effect  between  the  sine  and  the  noise.  A  multi-spectral 
analysis  of  those  systems  may  stress  the  interaction  between 
the  modulation  and  the  noise. 

Nevertheless,  these  two  systems  seem  to  be  not  interest¬ 
ing  to  detect  a  small  amplitude  noisy  sine  by  adapting  the 
parameters  of  the  systems  to  the  variance  of  the  noise.  In¬ 
deed,  the  SNR  of  the  output  presents  a  maximum  for  an  op¬ 
timal  variance,  but  it  can  be  observed  in  Figures  3  and  5  that 
the  gains  of  the  two  systems  G  =  SNRo/SNRi,  where 
SNRi  is  the  SNR  of  the  input,  are  always  less  or  equal  to 
1 .  The  study  of  G  had  never  been  done  [1 , 2,  3, 4, 5, 6],  but 
we  think  that  the  same  result  may  be  found  in  continuous 
time  systems.  At  the  present  time  we  study  systems  sim¬ 
ilar  to  system  2  using  SETAR(1,  iV,  0, . . . ,  0)  models  and 
thresholding  the  output,  to  generalize  some  of  these  results. 
The  approach  is  to  find  systems  for  which  the  maximum 
of  the  local  power  of  the  sine  (plotted  against  appears 
while  the  spectrum  of  the  noise  5o(A)  is  roughly  constant 
(cf.  [1]).  It  is  expected  that  the  maximum  of  the  SNR  is  im¬ 
portant  and  that  the  gain  is  greater  than  1  for  such  systems. 


Figure  1.  Example  of  bimodal  potential 


Figure  2.  Theoretical  local  power  of  the  output 
sine  (top  figure)  and  theoretical  power  spec¬ 
tral  density  of  the  output  noise  at  Ao  (bottom 
figure)  of  system  1  plotted  against  c  =  10, 

Ao  =  .04  and  £  =  l). 
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Figure  3.  Theoretical  output  SNR  (top)  and 
theoretical  gains  (bottom)  of  system  1  plotted 
against  for  several  Ao-  c  =  10  and  e  =  i). 


Figure  5.  Top  figure:  Theoretical  (line  curve) 
and  experimental  (dotted  line)  output  SNR  of 
system  2  plotted  against  cr^;  bottom  figure: 
theoretical  (line  curve)  and  experimental  (dot¬ 
ted  line)  gain  of  system  2  plotted  against 
V  =  c  =  10  and  e  =  1. 


experimental  (dotted  line)  local  power  of  the 
output  sine  of  system  2  plotted  against 
bottom  figure:  theoretical  (line)  and  exper¬ 
imental  (dotted  line)  power  spectral  density 
at  Ao  of  the  output  noise  of  system  2  plotted 
against  v  =  c  =  10  and  e  -  i). 
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Abstract 

This  study  presents  a  new  method  for  textures  clas¬ 
sification  based  on  higher  order  statistics  (HOS).  We 
propose  the  use  of  third  order  correlation  tools  for  tex¬ 
ture  analysis.  We  compare  the  performance  of  three 
different  tools  :  the  bicorrelation  in  the  spatial  domain^ 
the  bispectrum  in  the  frequency  domain  and  the  bicor- 
spectrum  which  is  a  spatial/ frequency  representation  in 
that  case.  We  test  classification  on  representative  tex¬ 
tures  of  Brodatz  album. 

1*  Introduction 

We  can  find  a  lot  of  textures  analysis  methods  in 
litterature,  based  on  very  different  principles.  We  can 
cite  for  the  most  used  : 

♦  the  coocurrence  or  SGBD  matrices  [6] 

♦  wavelets  based  method  [7] 

•  Gabor  filters  [1] 

•  Markov  based  methods  [5] 

All  these  methods  describe  only  the  second  order 
statistics. 

We  propose  to  test  higher  order  statistics  for  texture 
analysis.  We  know  that  moments  are  not  sufficient  to 
describe  efficiently  a  texture.  We  use  dynamic  tools, 
that  is  to  say  tool  depending  on  spatial  of  frequency 
parameters  as  features  for  classification. 

2.  Gaussianity  of  textures 

2.1.  Gaussianity  tests 

Before  using  higher  order  statistics  (HOS),  it  is  cru¬ 
cial  to  check  if  the  process  is  Normal  or  not.  Within 


this  framework,  skewness  and  kurtosis  tests  (third  and 
fourth  order  cumulants)  form  a  set  of  relevant  param¬ 
eters  when  measuring  normality  ;  they  also  exhibit  a 
good  compromise  between  numerical  complexity  and 
accuracy.  We  chose  several  representative  Brodatz  tex¬ 
tures.  Sub-images  for  estimating  cumulants  have  64x64 
pixels.  The  precision  of  the  estimators  only  depends 
on  their  mean  and  variance.  Thus,  4096  samples  are 
enough  to  get  a  good  precision  for  both  the  skewness 
and  the  kurtosis.  The  measures  of  the  skewness  and 
kurtosis  for  normalized  estimators,  presented  on  table 
1,  show  that  most  of  the  textures  have  non  symetric 
p.d.f’s  (non  zero  skewness)  [3]. 

2.2.  Image  of  textures 

We  present  1  the  texture  basis.  We  choose  9  tex¬ 
tures  of  the  Brodatz  album  [2] ,  with  different  proper¬ 
ties  qualifying  these  textures.  There  are  fine  or  coarse 
textures,  isotropic  or  not,  ... 

2.3.  Results 

We  present  Table  1  results  of  Skewness  and  Kurtosis 
tests  for  the  textures.  We  process  the  tests  on  4096 
samples. 


Texture  f  Or  dr  e 

s 

K 

D9  grass  lawn 

0.20 

-0.82 

D17  herringbone  weave 

-0.48 

-0.64 

D19  woolen  cloth 

0.41 

-0.42 

D24  pressed  calf  leather 

0.65 

-0.81 

D29  beach  sand 

-0.54 

-0.098 

D38  water 

-0.18 

-0.23 

D68  wood  grain 

-1.17 

0.67 

D84  raffia 

0.030 

-1.07 

D112  plastic  bubbles 

0.91 

0.55 

Table  1. 
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Figure  1.  Base  of  Brodatz  textures  used 

These  values  of  the  two  cumulants  exhibit  that  most 
of  the  textures  are  non  Gaussian.  Only  the  water  can 
be  considered  as  Gaussian.  Moreover,  the  textures  are 
well  represented  in  the  third  order  statistics.  We  can 
thus  focus  on  the  third  order  statistics,  by  using  third 
order  dynamic  tools. 

Remark  1  The  use  of  fourth  order  tools  such  as  tri- 
correlaiion  have  two  problem  :  on  the  one  hand,  the 
variance  of  estimators  are  larger  and  images  have  very 
limited  numbers  of  samples.  On  the  other  hand,  results 
of  iricorelation  is  less  readable,  because  it  is  in  a  three 
dimension  domain  for  a  ID  signal. 

3.  Choice  of  a  third  order  tool 

We  test  three  kind  of  third  order  correlation  : 

•  the  bicorrelation  in  the  spatial  domain 

•  the  bispectrum  in  the  frequency  domain  :  we 
choose  a  non-par ametric  estimator,  the  2D  Fourier 
transform  of  the  bicorrelation 

•  the  bicorspectrum,  a  new  spatial/frequency  repre¬ 
sentation  containing  third  order  correlation 

The  three  tools  presented  above  give  a  2D  result  for  a 
ID  process.  We  calculate  a  sampled  tool  and  give  the 
result  of  an  averaged  estimation. 

3.1.  Denition  of  tools 

The  bicorrelation  of  a  process  is  the  dynamic  version 
of  the  third  order  cumulant  [8] .  The  bicorrelation  of  a 


real  stationary  process  with  zero  mean  is  defined  by  : 

73  [m,  n]  =  J?  [xc  [t] '  Xc\t  -{■  ^  Xc[t  ?7]]  (1) 

We  present  on  figures  (2,  3)  results  of  bicorrelations 
on  two  textures  of  Brodatz  album  :  the  wool  image 
(D19  woolen  cloth)  and  the  wood  image  (D68  wood 
grain).  We  can  see  that  the  bicorrelation  seems  to  be 
discriminant  for  analysing  textures. 


wood  2  wood  20 


0  0  0  0 


Figure  2.  Sampling  bicorrelation  on  rows  of 
wood  image 

We  present  on  figures  (4,  5)  results  of  bicorrelations 
on  two  textures  of  Brodatz  album  :  the  wool  image 
(D19  woolen  cloth)  and  the  bubbles  image  (D68  plastic 
bubles) . 

We  can  see  that  the  bispectrum  seems  also  to  be 
discriminant  for  analysing  textures. 

The  Bicorspectrum  or  BET  is  defined  by  [4]  : 

(2) 

where  Xyj  is  the  windowed  signal  x  of  length  2T. 

We  present  on  figures  (6,  7)  results  of  bicorrelations 
on  two  textures  of  Brodatz  album  :  the  weave  image 
(D17  herringbone  weave)  and  the  grass  image  (D9  grass 
lawn).  We  can  see  that  the  bicorspectrum  seems  also 
to  be  discriminant  for  analysing  textures. 

3.2.  Choice  of  the  bicorrelation 

By  comparing  the  performance  of  the  different  third 
order  tools,  we  have  shown  that  the  best  feature  is  the 
bicorrelation.  We  have  compared  each  tools  itself,  by 
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wools 


woo!  17 


woolen  cloth  image 


testing  their  proper  rate  of  good  classification,  using  a 
simple  distance  to  discriminate  textures  each  other. 

We  use  a  biased  estimator  of  bicorrelation  (for  N  — 
1  >  m  >  n  >  0): 

^  ivr~m— 1 

X3,6  [m,  n]=  W  -Xclt  +  m]-  Xc  [t  +  n]  (3) 

t=0 

4.  Texture  Classification  using  Bicorrela¬ 
tion 

4.1.  The  use  of  Bicorrelation 

For  a  one  dimension  process,  the  sampled  bicorre¬ 
lation  gives  us  a  two  dimensional  result.  To  charac¬ 
terise  textures,  we  want  to  define  a  simple  vector  of 
one  dimensional  parameters. Thus,  we  have  defined  a 
monodimensional  tool  by  only  keeping  the  values  for 
m  =  n,  that  is  to  say  on  the  principal  axe. 

4.2.  Characterisation  of  textures 

We  consider  texture  as  two  stationary  1-D  random 
processes  rows  and  columns,  as  in  image  compression 
by  wavelets.  This  type  of  image  description  conserves 
all  the  information  contained  in  the  initial  2D  image. 

The  mean  and  the  variance  are  very  important  pa¬ 
rameters  to  describe  textures,  but  they  are  not  suf¬ 
ficient  to  obtain  good  performance  for  classification. 
The  bicorrelation  is  estimated  both  for  the  rows  and 


Figure  4.  Sampled  bispectrum  on  rows  of 
plastic  bubles  image 


wool  6  wool  17 


0  0  0  0 


Figure  5.  Sampled  bispectrum  on  rows  of 
woolen  cloth  image 
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herringbones 


herringbone  10 


grass? 


grass  19 


10.  5- 


Figure  6.  Sampling  bicorspectrum  on  rows  of 
herringbone  weave  image 


Figure  7.  Sampling  bicorspectrum  on  rows  of 
grass  image 


the  columns  and  the  result,  that  is  to  say  two  vec¬ 
tors  of  bicorrelation,  are  given  by  evaluating  the  mean 
of  the  estimated  bicorrelations  along  the  rows  and  the 
columns.  The  sampling  bicorrelation  is  estimated  only 
on  the  first  delays  along  the  principal  axe.  To  complete 
the  statistical  study,  we  add  mean,  variance,  skewness, 
kurtosis  and  three  delays  of  autocorrelation  to  the  vec¬ 
tor  of  parameters.  We  measure  the  difference  of  per¬ 
formance  between  second  order  and  when  we  add  third 
order  tools. 

We  compare  the  results  with  a  basic  method,  the 
SGBD  or  coocurrences  matrices.  We  have  used  the 
four  parmeters  proposed  by  Haralick  [6],  for  the  two 
unit  vectors,  i  and  j.  These  parameters  are  :  contrast, 
entropy,  correlation  and  two  order  angular  moment. 

We  choose  a  neural  network  for  discriminating  tex¬ 
tures.  The  learning  rules  are  based  on  a  backpropaga- 
tion  algorithm.  The  functions  of  associated  decisions 
can  then  be  written  in  form  of  production  rules. 


1.  a  set  56  images  64x64  for  each  texture  (leave  one 
out  rules). 


Tool  Order  2 

Good  classification  96,  8% 
Non-classification  0, 2% 

Bad  classification  3.0% 


Order  3  Coocurrences 
98,8%  95,4% 

0,2%  0,2% 

1.0%  4.4% 


2,  a  set  224  images  32x32  for  each  texture  (10%  for 
learning  set). 


Tools  Order  2 

Good  classification  90, 3% 
Non-classification  6, 2% 

Bad  classification  3.5% 


Order  3  Coocurrences 
92,5%  90,5% 

4,1%  3,5% 

3.4%  6% 


With  10%  for  the  test  set  and  90%  for  the  learning 
set,  the  results  are  quite  good  with  the  decreasing 
number  of  samples  to  estimate  bicorrelation.  We 
obtain  more  than  92%  of  good  classification  over 
the  test  set. 


5.  Performance  on  Brodatz  Textures 

We  present  the  results  of  classification  and  compare 
to  the  results  with  parameters  of  coocurrence  matrices 
in  same  directions.  We  use  a  leave  one  out  valida¬ 
tion  method  for  testing  the  performance  of  the  differ¬ 
ent  tools.  Surprisingly  for  large  sub-images  of  test,  we 
obtain  better  results  by  using  correlations  than  coocur¬ 
rence  matrix. 


6.  Conclusion 

To  conclude,  we  propose  to  give  a  set  of  parameters, 
’’adapted”  for  classification,  depending  on  the  size  of 
analysis  sub-image  : 

•  for  64x64  or  32x32  sub-images,  the  set  is  composed 
with  the  mean,  the  variance,  the  skewness,  the 
kurtosis,  the  first  three  lags  of  autocorrelation  and 
firt  two  lags  of  bicorrelation,  both  for  the  rows  and 
the  columns. 
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•  for  16x16  or  less  sub-images,  SGBD  matrices  reach 
better  performance 
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ABSTRACT 

Bilinear  systems  are  useful  to  model  nonlinear 
time  series.  They  can  be  described  by  a  nonlin¬ 
ear  recursive  equation  involving  a  finite  number  of 
parameters.  Their  analysis  and  particularly  the 
estimation  of  the  parameters  is  of  central  interest. 
In  this  paper  we  establish  difference  equations  be¬ 
tween  lagged  moments  and  cumulants  up  to  third- 
order  of  a  simple  bilinear  model,  and  show  how  to 
use  these  relations  to  estimate  the  parameters. 

1.  INTRODUCTION 

In  many  signal  processing  problems  the  time  se¬ 
ries  are  represented  by  linear  models.  But  in  sev¬ 
eral  situations  the  linear  representation  fails  to 
describe  the  involved  series  satisfactorily.  Thus, 
some  nonlinear  models  have  been  proposed  to  han¬ 
dle  such  time  series  (see  [7]  and  [3]  for  a  recent 
review).  The  most  well  known  of  these  models 
is  the  Volterra  series  expansion,  which  was  first 
studied  by  Wiener.  The  input-output  relation  of 
a  Volterra  system  is  written  by  means  of  kernels. 
In  theory,  a  nonlinear  series  can  be  expanded  with 
an  infinite  number  of  kernels,  but  in  practice  the 
Volterra  expansion  must  be  truncated.  Unfortu¬ 
nately,  there  does  not  exist  any  approximation  cri¬ 
terion  which  allows  us  to  state  a  priori  at  which 
degree  of  nonlinearity  we  may  stop  the  expansion. 
Furthermore,  the  numerical  complexity  increases 
exponentially  with  this  degree.  That  is  the  reason 
why  only  quadratic  or  cubic  systems  have  been 
studied  in  the  literature. 

Recently,  a  new  class  of  nonlinear  models  have 
been  introduced  in  control  theory  [8,  5].  This  is 


the  class  of  bilinear  systems.  The  input-output 
relation  of  such  a  system  is  described  by  a  non¬ 
linear  recursive  equation.  The  main  advantage  of 
this  representation  is  that  it  involves  only  a  finite 
number  of  parameters  while  the  Volterra  series  ex¬ 
pansion  of  a  bilinear  system  requires  an  infinite 
number  of  kernels  [3].  In  other  words  bilinear 
models  have  the  same  kind  of  relationship  with 
Volterra  systems  as  ARMA  models  with  linear  sys¬ 
tems.  The  counterpart  of  the  reduced  number  of 
parameters  is  that  the  analysis  is  more  compli¬ 
cated  than  that  of  a  truncated  Volterra  system 
and  this  is  due  to  the  nonlinear  recursive  equa¬ 
tion.  Despite  this  aspect,  bilinear  models  seems 
to  be  an  appropriate  tool  when  dealing  with  non¬ 
linear  time  series.  In  section  2  we  briefly  present 
generalities  on  bilinear  time  series.  Then  we  re¬ 
strict  our  study  to  a  simple  model.  In  section  3 
we  give  the  expressions  of  lagged  moments  up  to 
third-order  and  in  section  4  we  apply  these  for¬ 
mulae  to  the  estimation  of  the  model  parameters. 
The  cases  where  the  input  {e„}  is  available,  and 
where  {e^}  is  unobservable  are  considered. 

2.  BILINEAR  MODELS 

The  general  discrete- time  bilinear  series,  denoted 
BL(p,  q,  r,  k),  is  written 

p  9 

niXn-i  +  Cien-i 

"b  ^  ^  ^  bijeji—iXji—j 

i=l  j=l 

where  {e„}  is  a  sequence  of  independent  and  iden¬ 
tically  distributed  zero-mean  random  variables 
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with  variance  <  oo.  The  coefficients  {oj}  and 
{ci}  correspond  respectively  to  the  AR  and  the 
MA  part  of  (1)  while  the  coefficients  corre¬ 
spond  to  the  bilinear  part.  If  bij  =  0  for  all  i  >  j, 
the  model  is  called  superdiagonal.  In  a  similar 
way,  if  bij  =  Oiox  i  <  j,  the  model  is  subdiagonal. 
The  statistical  analysis  of  a  subdiagonal  model  is 
much  more  difficult.  This  is  due  to  the  fact  that 
in  the  last  term  on  the  right-hand  side  of  (1),  the 
contribution  of  is  nonzero  when  Xn-j  is 

posterior  to  e„_j.  When  {a;„}  is  a  causal  function 
of  {e„},  Xn-j  is  then  correlated  with  e„_j,  and  this 
is  why  the  analysis  is  more  complicated.  Finally,  if 
bij  =  0  for  i  ^  j,  the  model  is  said  to  be  diagonal. 

The  study  of  the  statistical  properties  of  a  bi¬ 
linear  model  began  with  the  works  of  Granger,  An¬ 
dersen  [2]  and  Subba  Rao  [10].  The  conditions  to 
ensure  the  existence  and  uniqueness  of  a  station¬ 
ary  and  ergodic  solution  of  (1)  are  more  difficult 
to  obtain  than  in  the  linear  case.  They  have  been 
studied  by  several  authors  in  some  special  cases 
[6,  1].  Contrary  to  the  linear  case,  the  stationar- 
ity  condition  depends  on  the  variance  of  the  input 
{c„}.  For  example,  a  BL(1, 0,1,1)  model  is  strictly 
stationary  if  and  only  if  \a\  -f  <  1.^  When 

using  a  model  (linear  or  nonlinear)  for  prediction, 
we  require  that  this  model  be  invertible,  i.e.,  that 
its  input  can  be  expressed  as  a  function  of  its  out¬ 
put.  When  the  model  is  linear,  this  condition  is 
easy  to  obtain  and  does  not  depend  on  the  distri¬ 
bution  of  the  output.  When  the  model  is  bilinear, 
this  is  not  the  case  and  the  condition  is  much  more 
difficult  to  obtain. 

The  parameters  of  a  bilinear  model  can  be  es¬ 
timated  either  by  a  maximum  likelihood  method, 
[11],  or  by  a  method  of  moments.  Although  the 
maximum  likelihood  estimator  is  usually  more  ef¬ 
ficient  than  the  moment  estimator,  the  latter  is 
simpler  to  implement.  The  method  of  moments 
consists  in  using  recursive  equations  involving  mo¬ 
ments  or  cumulants  and  replacing  the  theoretical 
values  by  their  estimates.  The  second-order  prop¬ 
erties  of  a  bilinear  system  are  similar  to  those  of  a 
linear  system.  More  precisely,  a  BL(p,  q,  p,  q) 
model  ha^!  the  same  covariance  structure  as  an 

^Note  that  this  condition  does  not  necessarily  imply  the 
existence  of  the  moments. 


ARMA(p,  q)  model.  Consequently,  we  cannot  dis¬ 
tinguish  a  bilinear  series  from  a  linear  one  with 
a  classical  covariance  analysis.  It  is  then  neces¬ 
sarily  to  also  study  some  high-order  cumulants. 
A  method  of  moments  has  been  used  to  identify 
very  simple  models  in  [2,  9,  4].  In  particular,  the 
MA  part  is  absent  in  these  models  (q  =  0  in  (1)). 
To  our  knowledge,  the  case  of  the  general  bilinear 
model  (1)  is  not  solved.  The  reason  is  that  the 
complexity  of  the  calculations  increases  consider¬ 
ably  even  when  only  one  term  is  added. 

3.  EXPRESSIONS  OF  THE  MOMENTS 
UP  TO  THIRD-ORDER 

In  the  rest  of  the  paper,  we  consider  the  bilinear 
model  BL(1, 1,1,1) 

“b  — 1  “b  — 1  -b  bCji — 'lXn—\  (2) 

where  {e„}  is  a  sequence  of  independent  and  iden¬ 
tically  distributed  Gaussian  and  zero-mean  ran¬ 
dom  variables  with  variance  cr^.  Furthermore,  we 
assume  that 

-1-  b'^a^  <  1  (3) 

which  is  a  sufficient  condition  for  the  existence  and 
uniqueness  of  a  stable,  causal  and  strictly  station¬ 
ary  solution  for  the  model  (2). 

The  moments  can  be  calculated  directly  using 
(2),  but  the  calculations  are  tedious  because  of 
the  diagonal  term  Nevertheless,  under 

condition  (3)  the  model  can  be  written  with  the 
Markovian  representation 


=  Zn-1  + 

(4) 

=  *4“  Pn^n 

(5) 

=  a  +  ben 

(6) 

=  a  -h  c  -|-  be^  • 

(7) 

As  we  are  interested  in  a  causal  solution,  we  de¬ 
duce  that  depends  on  e^-A;  only  when  A:  >  0. 
Consequently,  Zn  and  are  independent  when 
A;  >  0,  and  the  use  of  equations  (4)-(7)  instead 
of  (2)  reduces  considerably  the  complexity  of  the 
calculations  to  express  the  moments. 

In  the  following,  we  derive  the  difference  equa¬ 
tions  between  lagged  moments  up  to  third-order 
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defined  by  mi  =  E(a;n),  rn2{k)  =  E(x„x„_A:)  and 
maih^ki)  =  E{xnXn-k,Xn-h-k2)-  Note  that  the 
definition  of  m3(A;i,  ^2)  is  different  from  the  usual 
definition  E(x„Xn-fcia:n-fc2)-  This  definition  is  weU 
suited  to  the  structure  of  the  third-order  moments 
of  (2),  but  its  counterpart  is  that  mz{k\,k2)  ^ 
mz{k2,  ki).  Because  of  symmetry  relations,  we  can 
reduce  our  study  to  the  case  where  k,ki,k2  are 
nonnegative.  The  details  of  the  calculations  are 
omitted. 


ten 


The  first-order  moment  of  {xn}  can  be  writ- 
6(7^ 


Note  that  according  to  (3),  a  7^  1,  and  then  mi  is 
well-defined. 


•  The  second-order  moments  of  {x„}  can  be 
written 

7712(0)  =  -|-  -|-  2ac  -|-  1 

+  2b(2a  +  c)mi]/(l  -  -  b'^a^) 

m2(l)  =  om2(0)  -h  26cr^mi  -I- 

m2(k)  =  am2{k  —  1)  +  bcr'^mi 

for  aU  fe  >  1.  The  stationary  condition  (3)  is  suf¬ 
ficient  to  ensure  the  existence  of  the  function  m2. 


•  The  expression  of  the  function  m3  is  as  fol¬ 
lows.  We  have 


m3(0,0) 


Bim2(0)  -h  Cl  mi  -1-  D\ 

1  -  a3  -  3a62(T2 


where 

Bi  =  +  3a^  -f-  2ac) 

Cl  =  Z(T^[6b^(T^{(i  -f  c)  -t-  2a^c  +  a(?  -f  1] 

=  Za%{2b'^a^  +  4ac  -f-  3c^). 

On  the  other  hand 

m3(0, 1)  =  A2m3(0, 0)  -t-  B2m2(0)  -1-  C2mi  -I-  D2 


where 

A2  =  +  b^o^ 

B2  =  2ba^{Za  -H  c) 

C2  =  a^{b^(T^  +  <?  +  4ac  -f  1) 
D2  =  Abca^ 


and 

m3(0,  k)  =  A3m3(0,  A;  —  1)  -1-  Bzm2{k  -  1)  -h  Czm^ 
for  all  fc  >  1,  with 
Az  =  A2 

Bz  =  2ba^{2a  c) 

Cz  =  <T^(26^cr^  -I-  -I-  2ac  -|- 1). 


Lastly 

m3(l,  0)  =  am3(0, 0)  -1-  36(T^m2(0)  -1-  2ca^m\ 
7773(1,  k)  =  am3(0,  k)  4-  2ba^m2{k  -  1)  -t-  ccr^mi 

for  all  A:  >  0,  and 

‘mz{ki,k2)  =  amz{k\  -  1,  A;2)  +  ba^m2{k2) 

for  all  A;i  >  1  and  A;2  >  0.  The  previous  expres¬ 
sions  are  obtained  under  the  assumption 

|a|(a2  +  36^2)  <  1.  (8) 

This  is  a  sufficient  condition  to  ensure  the  exis¬ 
tence  of  the  function  m3.  Note  that  the  stationar- 
ity  condition  (3)  does  not  necessary  imply  (8)  and, 
as  a  consequence,  the  process  {xjt}  can  be  strictly 
stationary  but  not  third-order  stationary  since  the 
function  m3  may  not  exist. 

4.  IDENTIFIABILITY 

In  this  section  we  use  the  explicit  relations  be¬ 
tween  moments  and  cumulants  and  the  expres¬ 
sions  of  the  moments  given  in  the  previous  sec¬ 
tion  to  derive  the  lagged  cumulants  of  the  bilin¬ 
ear  process  up  to  third-order.  Then  we  address 
the  problem  of  the  determination  of  the  parame¬ 
ters  of  (2)  using  these  cumulants.  We  assume 
that  the  conditions  (3)  and  (8)  hold.  Let  ci  = 
Cum(xn),  C2(A;)  =  Cum(x„,x„_fc)  and  Cz{ki,k2)  = 
Cum(x„,  x„_fci ,  x„_fci  _fe2  )■ 

•  The  second-order  cumulant  function  C2  is 
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given  by 


2  +  2c  +  1)(1  -  a) 

C2(0)  -  (7  _  ^2  _  b2^2)(l  _  a)2 

CT^(c^  +  2ac  +  1) 

+  1_  02-62^2 

52^4 

C2(l)  =  ac2(0)  +  ca^  +  ^  ^ 

C2{k)  =  ac2{k  -  1) 


(9) 

(10) 

(11) 


for  all  fc  >  1. 


•  The  third-order  cumulant  function  C3  satis¬ 


fies 


C3(0,  k)  -  {a^  -\-  6^o-^)c3(0,  k-l)  +  2ba^[a  +  c 


+  6V/(l-a)]c2(fc-l) 


(12) 


for  all  fc  >  I5 

C3(l,  0)  =  ac3(0, 0)  -H  2ba^C2{0)  (13) 

€3(1,  k)  =  ac3(0,  k)  +  b(y^C2{k)  (14) 

for  aU  fc  >  0,  and 

£3(^:1,  A:2)  =  acz(ki  -  1,  k2)  (15) 


for  all  fci  >  1  and  k2  >  0.  If  C2(A:)  =  0  for  all 
k  ^  0  and  C3{ki,k2)  =  0  for  all  fci  >  0  and  k2  > 
0,  it  can  be  shown  that  we  have  a  =  b  =  c  = 
0  and  0-2  =  C2(0),  which  means  that  {xn}  is  a 
Gaussian  independent  process.  Now  suppose  that 
there  exist  one  index  A:'  >  0  such  that  C2(A:')  7^ 
0  or  two  indices  >  0  and  —  9  such  that 
C3{k[,k2)  7^  0.  Then  the  AR  coefficient  a  can  be 
estimated  either  from  (11)  where  k  =  k'  +  1,  and 
we  have 


_  ^2{k'  "b  1) 

C2{k') 


(16) 


or  from  (15)  where  fci  =  fcj  -f  1  and  k2  —  and 
we  have 


_  ^3(^1  + 1;^) 

C3{k[,k'2) 


(17) 


The  parameters  b  and  c  cannot  be  directly  es¬ 
timated.  By  introducing  A  =  ba"^  and  fj,  =  ca"^,  we 
deduce  from  (13)  that 


_  €3(1,0)  -  003(0,0) 

2c2(0) 


(18) 


and  then  it  results  from  (10)  that 


Ai  =  C2(l)  -  ac2(0)  -  (19) 

Note  that  the  identification  of  the  parameters  a,  A 
and  n  is  not  possible  using  only  the  second-order 
cumulants.  On  the  one  hand,  if  <7^  is  known,  the 
model  is  identifiable  and  we  have  b  =  A/cr2 
c  =  /i/cr^.  On  the  other  hand,  if  is  not  known, 
it  must  be  estimated  from  the  relations  between 
the  cumulants.  Equation  (12)  is  equivalent  to 

Hliky  =  H2ik)  (20) 

for  aU  A;  >  1,  where 

.ffi(A;)  =  C3(0,  k)  -  a^C3{0,  k-1)-  2Xac2{k  -  1) 
H2{k)  =  A2c3(0,  A;  -  1)  +  2A[Ai  -b  AV(1  -  a)] 

X  C2{k  -  1). 

It  is  not  always  possible  to  obtain  from  (20) 
since  the  terms  Hi{k)  and  H2{k)  may  be  zero  for 
all  A;  >  1.  When  H2{ko)  7^  0  for  some  ko  >  1, 
the  model  is  identifiable  and  theoretically  cr^  can 
be  estimated  from  (20)  when  k  =  ko-  However, 
from  a  practical  point  of  view  this  latter  equation 
is  not  numerically  stable  and  it  would  be  better  to 
use  equation  (9)  which  can  be  written  as  a  second- 
order  polynomial  in  <7^.  This  polynomial  admits 
two  solutions  but  it  can  be  shown  after  some  te¬ 
dious  calculations  that  only  the  greater  solution 
verifies  equation  (20).  When  H2{ko)  =  0  for  all 
A;o  >  1,  the  model  is  not  identifiable  and  there  ex¬ 
ists  two  sets  of  parameters  that  are  solutions  of 
the  problem. 

The  method  of  moments  consists  in  replacing 
the  theoretical  cumulants  by  their  estimates  ob¬ 
tained  from  a  finite  number  of  output  samples  and 
solving  for  the  parameters.  When  the  input  is 
available,  is  directly  estimated  from  the  data 
and  we  can  then  identify  the  four  parameters  of 
the  model.  On  the  other  hand,  when  the  input  is 
not  observable,  <7^  is  not  known  and  has  to  be  es¬ 
timated  using  the  output  cumulants  as  described 
above.  To  illustrate  these  results  we  have  gener¬ 
ated  100  realizations,  each  of  1000  samples,  for  two 
sets  of  parameters  a  =  —0.5,6  =  0.3,  c  =  0.1,  cr^  = 
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1  and  a  =  0.2,6  =  — 0.2,c  =  0.2,  <7^  =  1.  As¬ 
suming  that  only  the  output  is  available,  we  have 
estimated  the  parameters  using  equations  (16-17), 
(18),  (19)  and  (9).  The  means  and  variance  of  the 
estimates  are  given  in  Tables  1  and  2.  We  can 
see  that  the  results  are  satisfactory  and  are  in  ac¬ 
cordance  with  the  results  obtained  in  [9]  for  the 
identification  of  a  BL(1, 0,1,1)  model  which  does 
not  contain  an  MA  part. 


Table  1 


a 

b 

C 

true  value 

-0.5 

0.3 

0.1 

1 

mean 

0.0901 

1.0064 

variance 

0.0028 

0.0011 

0.0056 

0.0023 
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Table  2 


a 

b 

C 

BOB 

KOI 

■imiiiiM 

■IIWIKCT 

variance 

0.0053 

0.0016 
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Signal  Processing  with  Alpha-Stable  Distributions: 
Current  and  Future  Trends 


Dr.  Chrysostomos  L.  Nikias,  Univ.  of  Southern  California  (USA) 

The  importance  of  extending  the  statistical  signal  processing  methodology  to  the  alpha-stable  framework  is 
apparent.  First,  scientists  and  engineers  have  started  to  appreciate  lower-order  moments  and  the  elegant 
scaling  and  self-similarity  properties  of  stable  distributions.  Additionally,  real  life  applications  exist  in 
which  impulsive  channels  tend  to  produce  large-amplitude,  short-duration  interferences  more  frequently 
than  Gaussian  channels  do.  The  stable  law  has  been  shown  to  successfully  model  noise  over  certain 
impulsive  channels.  In  this  talk,  we  present  an  overview  of  alpha-stable  processes  and  lower-order  statistics. 
In  addition,  we  review  recent  advanced  techniques  for  detection,  parameter  estimation,  and  system 
identification  in  the  presence  of  signals/noise  modeled  as  stable  processes.  Finally,  we  address  future  trends 
on  signal  processing  within  the  alpha-stable  framework. 
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Abstract 

A  family  of  blind  equalisation  algorithms  identi¬ 
fies  a  channel  model  based  on  a  higher-order  cumu¬ 
lant  (HOC)  fitting  approach.  Since  HOC  cost  func¬ 
tions  are  multimodal  gradient  search  techniques 
require  a  good  initial  estimate  to  avoid  converging 
to  local  minima.  We  present  a  blind  identification 
scheme  which  uses  genetic  algorithms  ( GAs)  to  op¬ 
timise  a  HOC  cost  function.  Because  GAs  are  effi¬ 
cient  global  optimal  search  strategies,  the  proposed 
method  guarantees  to  find  a  global  optimal  channel 
estimate.  A  micro-GA  implementation  is  adopted 
to  further  enhance  computational  efficiency.  As 
is  demonstrated  in  computer  simulation,  this  GA 
based  scheme  is  robust  and  accurate,  and  has  a  fast 
convergence  performance. 

1.  Introduction 

An  important  class  of  blind  equalisation  algo¬ 
rithms  uses  techniques  based  on  HOCs  [l]-[4].  A 
two-stage  strategy  is  usually  adopted,  which  first 
identifies  a  channel  model  using  HOC  fitting  al¬ 
gorithms  and  then  employs  the  estimated  channel 
model  to  design  an  equaliser.  The  key  step  of  this 
approach  is  its  ability  to  obtain  an  accurate  chan¬ 
nel  model.  Once  a  channel  model  is  available,  a 
variety  of  existing  equaliser  design  methods  can  be 
employed,  ranging  from  simple  linear  inverse  filter 
to  sophisticated  maximum  likelihood  sequence  es¬ 
timator,  depending  on  a  trade-off  between  perfor¬ 


mance  and  complexity.  Therefore,  we  concentrate 
on  blind  channel  identification  using  the  HOC  fit¬ 
ting  approach  in  this  paper. 

HOC  fitting  cost  functions  are  however  multi¬ 
modal,  and  conventional  gradient  techniques  [3], [4] 
may  converge  to  “wrong”  solutions  unless  a  good 
initial  value  for  the  channel  parameters  is  pro¬ 
vided,  which  is  not  always  possible.  To  overcome 
the  problem  of  local  minima,  simulated  annealing 
has  been  implemented  to  optimize  a  HOC  cost 
function  [5].  We  propose  to  use  GAs  [6]-[9]  for 
blind  channel  identification  based  on  HOC  fitting. 
This  GA  based  scheme  is  very  robust  and  achieves 
a  global  optimal  solution  regardless  of  initial  value 
of  channel  estimate.  Furthermore,  the  number  of 
parameters  to  be  optimised  in  the  problem  of  blind 
channel  estimation  is  usually  small,  and  GAs  are 
particularly  effective  to  solve  this  kind  of  optimisa¬ 
tion  problems.  The  micro-GA  implementation  [8] 
is  adopted  to  further  improve  convergence  rate. 

The  channel  is  modelled  as  a  finite  impulse  re¬ 
sponse  filter  with  an  additive  Gaussian  white  noise: 

Tla 

“  0  +  (^) 

t=0 

Blind  channel  identification  refers  to  the  determi¬ 
nation  of  the  channel  model  a  =  [ao  «!  •  •  •  ®no]^ 
ing  only  the  noisy  received  signal  {r(A;)}  and  some 
knowledge  of  statistic  properties  of  s{k).  We  will 
assume  a  real-valued  channel  and  a  PAM  symbol 
constellation.  Extension  to  complex-valued  chan¬ 
nels  and  modulation  schemes  is  straightforward. 
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2.  Higher-order  cumulant  fitting 

Since  3r*d-order  cumulants  of  r(k)  are  zero,  4th- 
order  cumulants  have  to  be  used  [10].  The  4th- 
order  cumulant  fitting  cost  function  adopted  in  our 
application  is  defined  by: 

J4(a)  = 

■ha  (  imn{na  ,na-r}  \  ^ 

£  I  <^4,r(r,  r,  r)  -  74,5  ]  (2) 

Tz=‘-ha  \  t=max{0,--T}  / 

where  C'4,r(T,  r,  r)  is  the  diagonal  slice  of  the  time 
estimate  C4,r(7'i,r2,r3)  of  the  4//i-order  cumulant, 
74,5  the  kurtosis  of  s(k),  ha  an  estimated  channel 
length  and  a  the  channel  estimate. 

Most  of  the  existing  algorithms  for  HOC  fitting 
employ  gradient  search  techniques.  HOC  fitting 
cost  functions  are  well-known  to  be  multimodal, 
and  gradient  based  optimisation  methods  may  fail 
to  work.  Even  with  measures  of  providing  good 
initial  channel  estimate,  it  has  been  observed  that 
gradient  algorithms  sometimes  converge  to  local 
minima  [3].  Using  GAs  to  optimize  the  cost  func¬ 
tion  (2)  has  the  advantage  of  guaranteeing  to  find 
a  global  optimal  channel  estimate. 

A  channel  model  (1)  typically  contains  a  few 
taps.  Thus  the  number  of  parameters  to  be  op¬ 
timized  in  (2)  is  small,  and  GAs  are  efficient  in 
solving  this  kind  of  “smaU-dimensional”  optimisa¬ 
tion  problems.  In  reality,  the  channel  order  na 
is  unknown.  A  simple  method  is  to  overfit  with 
Wo  >  Wo.  Although  this  wiU  complicate  the  cost 
function  and  may  cause  problems  to  gradient  based 
methods,  the  GA  based  method  is  capable  of  iden¬ 
tifying  those  nonexisting  taps  with  (near)  zero  val¬ 
ues.  An  inspection  of  the  obtained  channel  esti¬ 
mate  allows  us  to  delete  those  insignificant  taps. 

3.  Genetic  algorithms 

GAs  belong  to  a  problem  solving  approach  based 
loosely  on  the  evolution  of  species  in  nature. 
They  differ  from  gradient  optimisation  techniques 
mainly  in  four  aspects  [7].  Firstly,  GAs  work  with 
an  encoding  of  the  parameter  set  to  be  searched, 
not  the  parameters  themselves.  Secondly,  unlike 


gradient  techniques,  which  concentrate  their  ef¬ 
forts  on  a  single  potential  solution  in  the  search 
space,  GAs  search  with  a  population  of  potential 
solutions.  Thirdly,  GAs  use  the  value  of  the  ob¬ 
jective  function  (termed  fitness),  not  derivatives, 
to  evaluate  potential  solutions.  Lastly,  GAs  use 
probabilistic  transition  rules.  The  seemingly  undi¬ 
rected  search  is  guided  by  the  fitness  value  of  each 
individual  and  how  it  compares  with  others. 

A  popular  encoding  scheme  is  the  bit-string  en¬ 
coding  [6],  which  is  adopted  in  our  application. 
A  simple  GA  usually  consists  of  three  operations, 
namely  selection,  crossover  and  mutation  [7],  at 
each  cycle  or  generation.  An  “elitist”  strategy, 
which  automatically  copies  a  few  of  the  best  solu¬ 
tions  in  the  population  into  the  next  genneration, 
is  often  incorporated.  In  crossover  operation,  we 
adopt  multiple  crossover  points  [7],  and  the  num¬ 
ber  of  crossover  points  in  our  application  is  4. 

Since  our  goal  is  to  find  a  global  optimum  so¬ 
lution  quickly,  the  micro-GA  [8]  offers  certain  ad¬ 
vantages.  The  population  size  in  a  micro-GA  is 
much  smaller  than  those  in  “standard”  GAs.  This 
feature  of  the  micro-GA  not  only  makes  it  partic¬ 
ularly  useful  for  nonstationary  optimisation  prob¬ 
lems  but  also  improves  convergence  rate  in  gen¬ 
eral  [8].  Simply  adopting  a  very  small  population 
size  and  letting  the  search  converge  just  once  is 
not  very  useful  apart  from  quickly  allocating  some 
local  optimum.  In  a  micro-GA,  after  the  search 
has  converged,  the  population  is  reinitialised  with 
random  values  while  the  best  individual  found  so 
far  is  copied  to  the  newly  generated  population. 
The  reinitialisation  is  repeated  until  no  further  im¬ 
provement  can  be  achieved. 

In  general,  the  more  complex  the  search  space 
is,  the  larger  the  population  size  should  be.  The 
population  size  in  our  micro-GA  is  2  times  of  the 
number  of  parameters.  The  crossover  rate  is  set 
to  1.0  to  facilitate  a  high  rate  of  information  ex¬ 
change  while  the  mutation  rate  is  set  to  0.0  as  the 
reinitialisation  of  the  population  will  keep  the  di¬ 
versity  of  potential  solutions  fairly  well.  Most  of 
GAs  adopt  the  proportional  selection  [7].  Due  to 
small  population  size  of  the  micro-GA,  the  tour¬ 
nament  selection  is  used  in  choosing  parents  [8]. 
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4.  GA  based  cumulant  fitting  scheme 


Our  GA  based  scheme  is  summarized  as  follows: 
Step  1.  Given  a  set  of  assume  an  over¬ 

length  ha  and  compute  the  time  estimate  of  the 
required  cumulant  to  form  the  cost  function  (2). 
Step  2.  With  a  set  of  initial  channel  parameter  vec¬ 
tors  where  np  is  the  population  size,  use 

the  micro-GA  to  optimise  the  cost  function  (2). 


The  fitness  function  value  fi  for  aj  is  defined  as 


fi  = 


1 


(3) 


A  measure  is  used  to  ensure  that  each  a^  satisfies 


^  of  _  G4,r(0,0)0) 
j=0 

In  the  population  initialisation,  the  taps  of  each 
ki  first  take  values  randomly  from  (—1,  1),  and 
each  chosen  ki  is  normalised  using  (4).  When  a 
new  population  is  produced,  each  member  is  also 
normalised.  This  ensures  that  the  population  is 
inside  the  feasible  set  of  channel  models  and  has 
an  effect  of  improving  convergence  rate. 


It  is  well-known  that  sign  and  time-shift  am¬ 
biguities  exist  in  blind  identification.  Sign  ambi¬ 
guity  is  due  to  the  fact  that  both  a  and  —a  are 
global  optimal  solutions  of  (2).  Time-shift  ambi¬ 
guity  can  be  illustrated  as  follows.  Let  the  true 
channel  be  [uq  •  •  -Ono]^?  Oo  7^  0  and  ^  0.  Let 
ha  =  na+2.  Then  [ao •  •  •«„„  0  0]^,  [0  oq  •••««„  0]^ 
and  [0  0  ao-'-OnJ^  ^.re  all  global  optimal  solu¬ 
tions.  A  solution  to  time-shift  ambiguity  is  to  fix 
the  first  tap.  We  do  not  fix  a  tap  but  use  the  fol- 
lowing  measure.  If  the  first  tap  of  a  member  is  zero 
(absolute  value  smaller  than  a  threshold),  a  shift¬ 
ing  is  performed  to  ensure  that  the  first  tap  is  al¬ 
ways  nonzero.  Complexity  of  the  GA  based  scheme 
is  determined  by  the  number  of  cost-function  eval¬ 
uations.  The  micro-GA  employed  is  specifically 
designed  to  minimize  this  complexity. 


5.  Simulation  results 


The  two  channels  used  in  the  simulation  were: 

Channel  1  a  =  [—0.21  —  0.50  0.72  0.36  0.21]^ 
Channel  2  a  =  [0.227  0.460  0.688  0.460  0.227]^ 


The  performance  of  the  algorithm  was  assessed 
through  the  best  cost  function  value  J4(a),  where 
a  was  the  best  channel  model  in  the  population, 
and  the  mean  tap  error  (MTE)  defined  by: 

na 

MTE=  II  ±a-a||2  =  2(±ai -Uif  (5) 

i=0 

In  the  expression  (5),  —a  is  used  if  a  converges  to 
—a.  Otherwise  a  is  used. 

8-PAM  data  symbols  were  transmitted  and 
50000  noisy  received  data  samples  were  used  to 
compute  the  time  estimate  of  the  4tfi-order  cumu¬ 
lant.  All  the  results  were  averaged  over  100  dif¬ 
ferent  runs.  Figs.  1  and  2  depict  evolutions  of  the 
cost  function  and  the  MTE  for  channel  1  with  dif¬ 
ferent  signal-to-noise  ratio  (SNR)  conditions  and 
assumed  channel  lengths  ha,  respectively.  Table  1 
summarises  the  results  (meanivariance)  for  chan¬ 
nel  1  with  a  SNR  of  20  dB.  Results  for  channel  2 
are  similarly  given  in  Figs.  3  and  4  and  table  2. 

Some  observations  can  be  drawn.  The  GA  based 
scheme  always  finds  a  global  optimal  channel  esti¬ 
mate  and  the  optimisation  process  converges  fast. 
Compared  with  other  existing  methods  of  HOC  fit¬ 
ting,  our  method  is  more  accurate  ajid  robust,  as  is 
demonstrated  by  very  small  variances  of  estimated 
channel  taps.  Our  method  is  capable  of  identifying 
nonexisting  channel  taps  with  (near)  zero  values 
(at  least  an  order  smaller  than  values  for  existing 
taps).  Thus,  model  order  selection  can  simply  be 
carried  out  by  first  assuming  an  overlength  ha  and 
then  inspecting  the  obtained  channel  estimate  to 
delete  those  insignificant  taps.  We  also  performed 
a  range  of  simulation  with  a  data  length  of  25000 
samples  and  16-PAM  symbols.  The  results,  not 
shown  here,  also  confirm  the  above  observations. 

6.  Conclusions 

A  GA  based  scheme  has  been  developed  for 
blind  channel  identification  based  on  HOC  fitting. 
Apart  from  ensuring  a  global  optimal  channel  esti¬ 
mate  regardless  of  initial  conditions,  the  proposed 
method  is  highly  accurate  and  very  robust.  Small 
variances  of  channel  estimates  and  insensitivity  to 
noise  achieved  in  our  simulation  was  not  reported 
previously  in  other  existing  methods.  Our  appli¬ 
cation  also  demonstrates  advantages  of  using  the 
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micro-GA  for  fast  global  optimisation  of  multi¬ 
modal  cost  functions. 
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Figure  1.  Cost  function  versus  number 
of  function  evaluations  for  channel  1. 
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Figure  2.  Mean  tap  error  versus  number 
of  function  evaluations  for  channel  1. 
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true 

estimate  (meanivariance) 

no  =  4 

li 

71(1  —  ^ 

na  =  6 

do 

dl 

d2 

ds 

d4 

as 

de 

-0.21 

-0.50 

0.72 

0.36 

0.21 

-0.20975±0.00014 

-0.49957±0.00003 

0.72092±0.00002 

0.35788±0.00006 

0.21073±0.00016 

-0.21124±0.00022 

-0.49975±0.00004 

0.71965±0.00002 

0.35859±0.00010 

0.21058±0.00013 

-0.00041±0.00050 

-0.21098±0.00020 

-0.49835±0.00005 

0.72087±0.00003 

0.35823±0.00011 

0.20870±0.00017 

-0.00184±0.00050 

-0.00173±0.00055 

Table  1.  Identification  results  for  channel  1  with  8-PAM  and  SNR— 20dB. 
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estimate  (meanivariance) 

no  =  4- 

no  =  4 

no  =  5 

71(1  —  6 

do 

dl 

d2 

d3 

d4 

ds 

do 

0.227 

0.460 

0.688 

0.460 

0.227 

0.22731±0.00080 

0.45727±0.00097 

0.68913±0.00070 

0.45744±0.00052 

0.22564±0.00079 

0.22221±0.00099 

0.45679±0.00127 

0.68095±0.00066 

0.46426±0.00077 

0.22322±0.00074 

0.01577±0.00781 

0.21743±0.00119 

0.45481±0.00242 

0.67408±0.00181 

0.46870±0.00262 

0.22265±0.00217 

0.01514±0.00653 

0.00359±0.00485 

Table  2.  Identification  results  for  channel  2  with  8-PAM  and  SNR— 20dB. 
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Figure  3.  Cost  function  versus  number 
of  function  evaluations  for  channel  2. 
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Figure  4.  Mean  tap  error  versus  number 
of  function  evaluations  for  channel  2. 
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Abstract 

In  this  paper  we  give  an  analytic  optimal  solution  to  the 
identification  problem  of  non  minimum  phase  systems  using 
the  fourth  order  spectra.  We  show  that  this  solution  is  in first 
approximation  equivalent  to  the  solution  given  by  the  well- 
known  kurtosis  maximization  method.  The  proposed  solu¬ 
tion  gives  the  phase  of  the  system  transfer  function,  the  mod¬ 
ulus  can  be  obtained from  the  second  order  statistics.  How¬ 
ever,  this  solution  requires  a  trispectrum  phase  unwrapping 
as  the  trispectrum  phase  is  known  in  the  interval  [-tt,  7r[ 
but  need  to  be  unwrapped  in  the  interval  [-Ait,  47r[  in  or¬ 
der  to  obtain  the  optimal  solution.  Therefore,  we  present 
different  phase  unwrapping  solutions.  Next,  we  propose  a 
method  to  improve  the  trispectrum  phase  estimation  using 
a  factorizability  condition.  Simulation  results  are  given  and 
the  algorithm  shows  good  behavior  even  with  few  data. 
Keywords  :  EOS  -  Fourth  Order  Statistics  -  Blind  Identifi¬ 
cation. 


1.  INTRODUCTION 

Fourth  order  statistics  of  complex  signals  are  among  the 
possible  tools  for  treating  the  problem  of  channel  identifica¬ 
tion  and  equalization  in  digital  communications  [1].  On  one 
hand,  time  domain  methods  have  been  studied  for  several 
years  [2],[1].  These  are  parametric  methods  and  requires 
a  correct  determination  of  the  system  order.  On  the  other 
hand,  some  frequency  domain  methods  have  been  devel¬ 
ops  [1]j[3],[4],[5].  Pierce  [6],[7]  and  Shalvi  &  Weinstein 
[8]  have  treated  the  case  of  complex  signals  with  applica¬ 
tions  in  the  field  of  radar  signals  and  equalization  respec¬ 
tively. 

In  this  paper,  we  present  an  optimal  solution  to  the  phase 
reconstruction  problem  that  use  all  the  trispectrum  infor¬ 
mation.  However,  this  solution  requires  a  previous  phase 
unwrapping  step  to  determine  the  trispectrum  phase  in  the 


interval  [47r;  47r[.  In  order  to  ensure  this  correct  determi¬ 
nation,  unwrapping  methods  [9]  require  that  the  maximum 
error  on  the  estimated  trispectrum  phase  is  less  than  7r/4. 
In  this  paper,  we  develop  different  phase  unwrapping  solu¬ 
tions.  Next,  we  present  a  method  to  improve  the  trispec¬ 
trum  phase  estimation.  This  method  is  based  on  a  factor¬ 
izability  condition  [10].  Indeed,  the  optimal  trispectrum 
obtained  fi*om  the  measurements  is  the  factorizable  trispec¬ 
trum  which  minimizes  the  distance  from  the  measured  val¬ 
ues.  Finally  simulation  results  show  the  results  enhance¬ 
ment  provided  by  such  an  improvement  method. 

2.  DEFINITIONS 

2.1.  Fourth  order  spectrum  of  complex  signals 

Our  developments  are  based  on  the  fourth  order  cumu- 
lant  which  never  vanishes  even  under  circularity  hypothesis 
[11][12].  The  fourth  order  cumulant  of  a  complex  random 
sequence  x{t)  is  given  by : 

=  E{x*{t)x{t  -h  ti)x{T -\-i2)x*{T  A-ts)} 
-E  {a!*(r)x(r  -j-  h)}  E  {a:(r  +  t2)x*{T  -f  ts)} 

-E  {x*{t)x(t  + 12)}  E  {x(r  +  ti)x*{T  +  ts)} 

-E  {x(t  -h  ii)x{T  -h  ^2)}  E  {ir''(r)a?*(r  -f  ^3)} ,  (1) 

and  the  corresponding  fourth  order  spectrum  is  [6]  [7] : 

=  (2) 

E  (— 4-  ^2  +  ^3)} 

-E{X{ui)X%0Ji)]E  {X{UJ2)X*  i0J2)}  K^2  +  OJg) 
-E{X{u;2)X\uj2)}E{X{u;i)X%L,i)}S{u;i-^^^^ 

^E  {X^u;s)X*{^u;3)}  E  {X{iVi)X{^ui  )}  +  ojs). 

where  X{u)  is  the  Fourier  transform  of  the  sequence  x{t) 
and  6(a;)  =  1  if  a;  =  0  and  =  0  elsewhere. 

♦  Remark:  the  three  planes  appearing  in  (2)  hold  no  phase 
information. 

•  Note  that  this  definition  is  equivalent  to  the  one  given 
in  [13]  if  we  note  ^3  = 


0-8186-8005-9/97  $10.00  ©  1997  IEEE 


189 


22.  Non-redundant  trispectrum  domain  for  com¬ 
plex  signals 

The  trispectrum  definition  implies  the  following  symme¬ 
tries  : 


V»(a;i,u;2,a;3) 


—  '0(^1  +  ^2  +  ^3,  ““^3,  -wi) 

“0(^1  +  ^2  OJSj  “^2) 

—  0(— 0^3,  Wi  -f  0^2  +  ^3}  —^1) 
— 0(— a;3,  cji  +  u;2  +  W3,  — u;2) 
0(wi ,  a;2,  —(^1  +  0^2  +  W3)) 
0(^2,  ^1?  “(^1  +  ^2  -h  C^s)) 


Using  these  trispectrum  symmetries,  the  non-redundant 
trispectrum  support  is  defin^  as  follows : 

•  Inside  the  stationary  domain : 

^  ^15^25^3  <  ^  +  L02  -V  <  \\ 

U2  <  — tt^3  ^1  ^  ""^3  c<;i  >  a;2  « 


Figure  1.  Non  redundant  domain  Inside  the  sta¬ 
tionary  support  (|  of  the  entire  domain  inside  the 
stationary  support). 

•  Outside  the  stationary  domain : 

^l,^2)^3  <  2  +  0^2  +  ^3  >  2» 

^1  ^  ^2  Wi  +  0^2  +  2a;3  >  1. 


Figure  2.  Non  redundant  domain  outside  the  sta¬ 
tionary  support  (|  of  the  entire  domain  outside  the 
stationary  support). 

23.  Identification  using  the  trispectrum 


x(t) 


h(t) 

H(co) 


Figure  3.  Identification  scheme. 


y(t) 


If  the  analyzed  signal  is  the  output  of  a  LTI  system  driven 
by  a  non-gaussian  zero-mean  IID  complex  sequence  x(<) 
(Fig.  3),  the  fourth  order  spectrum  phase  satisfies : 

04  (^1,^2, ^3)  =  +  v?^(a;2)  (3) 

+  ^2  +  ^^3)  4- 

where  (wi ,  (jJ2,^s)  is  the  output  trispectrum  phase, 
ip^  (w)  is  the  system  Fourier  transform  phase  and  fc  =  0  or 
1  depending  on  the  input  kurtosis  sign.  However  the  input 
kurtosis  sign  being  equal  to  the  output  kurtosis  sign,  the 
value  of  k  is  known  fi*om  the  output  measurements,  and  (3) 
can  be  written : 

04(^1  )^2j  ^3)  =  0^(ct;i,  W2,C<;3)  —  ^TT  =  (4) 

(p^{uJi)  -h  p^{ijJ2)  ”  <^^("’^<^3)  “■  +  W3)‘ 

In  this  paper,  we  use  the  fourth  order  statistics  to  reconstruct 
the  transfer  function  phase  only,  as  its  magnitude  can  be 
obtained  fi'om  the  second  order  statistics,  when  there  is  no 
measurement  noise. 


3.  LEAST  SQUARES  PHASE  RECON¬ 
STRUCTION  FROM  THE  TRISPEC¬ 
TRUM  PHASE 

The  optimal  phase  reconstruction  obtained  fi’om  the 
measured  trispectrum  phase  04(a;i ,  u;2,  ^3)  us^s  all  redun¬ 
dant  trispectrum  measures.  However  this  reconstruction  re¬ 
quires  a  prior  phase  unwrapping  (cf.  [9]).  The  criterion  and 
the  general  formula  for  real  signals  are  given  in  [4].  In  the 
complex  case,  the  optimal  fourth  order  solution  is  given  by 
the  phase  which  minimizes: 


N-l  JV-l  iV-1 


U4(W1,W2,W3)  -  (W1,W2,W3) 


(5) 


=0  U'2— 0  ttt3— 0 


where 


0f  (wi,u;2,a;3)  = 

—  -{- t<^2  4"  ^3)'  (^) 

The  minimum  is  obtained  when 

7sr_i  N-l 

0^(u;)  =  ^  XI  04(^,^i,W2)  +  K,  (7) 
0^1=0  0)2  =  0 

where  K  is  an  arbitrary  constant.  However,  due  to  the  di¬ 
visions  in  (7),  the  value  of  04(u;i, u;2,  ^3)  must  be  known 
in  the  interval  [-47r;  47r[,  but  firom  the  measurements,  this 
value  is  only  known  in  [-tt;  7r[.  In  section  5,  we  will  give 
different  algorithms  to  obtain  the  unwrapped  phase  fiom  its 
wrapped  value. 
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4.  roENTIFICATION  AND  KURTOSIS 
MAXIMIZATION 

This  section  shows  that  the  previous  least-squares  iden¬ 
tification  is  in  first  approximation  equivalent  to  the  solution 
given  by  the  kurtosis  maximization  criterion. 


V-r (‘^1.^2,  W3)  -  kxhy  ^4(wi,W2,W3)  (cf.  4). 

Finally,  (10)  becomes  : 

N-l  N-1  iV~l 

a;i=:0  a;2=0  0^3=0  /11\ 


4.1.  Kurtosis  maximization  criterion 

The  scheme  used  in  the  equalization  context  is  given  in 
Figure  4.  The  criterion  was  proposed  by  D.  Donoho  and 


x(t) 


y(t) 

A“1 

h(t) 

h  (t) 

Figure  4.  Equalization  scheme. 


later  by  O.  Shalvi  &  E.  Weinstein  [14]  [15]  [16]  in  order 
to  recover  the  input  sequence  x{t).  They  estimate 
through  the  maximization  of  \K(z)\  under  the  constraint 

where  K{z)  is  the  kurtosis  of  z. 
In  a  first  step  O.  Shalvi  &  E.  Weinstein  propose  to  whiten 
the  output  signal ;  by  this  way,  they  reconstruct  the  channel 
Fourier  transform  phase  as  in  the  reconstruction  algorithms 
in  the  frequency  domain. 


42.  Expression  of  the  Kurtosis  maximization  in  the 
frequency  domain 

The  kurtosis  of  the  output  sequence  z{t)  is  given  by  : 


4.3.  Taylor  expansion  of  the  kurtosis 

If  the  trispectrum  is  factorizable  and  its  phase  is  ac¬ 
curately  estimated,  the  difference  between  (wi,  ^2,013) 
and  W2)W3)  will  be  small  and  we  can  expand 

eq.(ll): 


N-l  N-1  N-1 


1-2  [^4(wi,W2,W3) 


(wi,W2,W3)j 


1 

24 


V’4(wi,W2,W3)  -  '04^(W1,W2,W3) 


+  ...  .  (12) 


If  we  limit  the  development  to  the  second  term,  the  maxi¬ 
mization  of  this  criterion  is  reduced  to  the  minimization  of 
the  LS  criterion  obtained  in  the  frequency  domain  (5).  Un¬ 
der  this  hypothesis,  kurtosis  maximization  is  reduced  to  the 
minimization  of  a  quadratic  criterion  for  which  the  analytic 
solution  is  known  (7).  However  it  needs  a  phase  unwr^- 
ping  which  has  a  solution  if  the  trispectrum  phase  estimate 
is  accurate  enough. 


Kiz)  =  C|(0,0,0)  =  f  E  E  T^(wi,W2,W3) 

N~1  N--1  N-1 

=  X  X  X 

0^1=0  a’2~0  W3  =  0 

Since  A'(^)  is  a  real  number,  its  modulus  is  : 


N-1  N-1  N-1 

aii=0a’2=0ii;3=0  (9) 


where  fe  =  1  if  (::)  <  0  and  I;  =  0  if  K{z)  >  0. 

•  If  the  output  is  whitened,  |T^  |  is  constant. 

•  |A'(^)|  being  a  positive  real  number,  the  complex  expo¬ 
nentials  in  (9)  are  replaced  by  their  real  part. 

Then  the  Fourier  phase  maximizing  the  kurtosis  of  the  out¬ 
put  is  the  phase  which  maximizes : 


J  = 


\im\ 

\T^\ 


(10) 


N-1  N-1  N-1 

X  X  X^°®[^’4(‘^l>"2.W3)-t/’f(wi,W2,W3) -IrTT 

a^i=0a;2=0a;3=:0 


5.  PHASE  UNWRAPPING  ALGORITHMS 

Some  phase  unwrapping  techniques  are  described  in  this 
section; 

-  the  first  solution  extend  Marron’s  bispectrum  phase  un¬ 
wrapping  to  the  trispectrum  case; 

-  the  second  method  uses  in  a  first  step  a  recursive  algorithm 
[17]  to  obtain  a  first  q>proximation  of  the  Fourier  phases 

The  modulo  Sjt  trispectrum  phases,  obtained  fit)m 
these  phases  using  (4)  give  the  interval  where  must  lie  the 
unwrapped  value  of  ^>4  (wi ,  W2 ,  W3); 

-  finally  a  method  combining  a  recursive  algorithm  and  the 
optimal  method  of  section  3  is  presented. 

5.1.  Extension  of  Marron’s  bispectrum  phase  un¬ 
wrapping  algorithm 

Matron  et  al.  [9]  have  shown  that  it  is  possible  to  deduce 
the  unwrapped  bispectrum  phases  i>3{ui ,  W2)  from  the 
IV  —  1  modulo  2ir  bispectrum  phases ; 

V>3(l,w),w  =  1,2,... , AT  -  1. 

To  obtain  the  unwr^ped  phases,  they  use  the  following 
equation: 


•  Since  the  kurtosis  sign  of  2  is  the  kurtosis  sign  of  x, 
the  value  of  k  is  the  same  as  in  (3);  then  we  can  replace 


V’3(W1,W2)  =  t/’3(wi  -  1,W2  +  1) -I- t/'3(l,W2) 

-  t^3(l,Wl-l).  (13) 
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The  Marron’s  algorithm  is  extended  to  the  fourth  order 
spectrum  of  complex  signals  as  follows  [18]  ; 

1p4{z  V  +  W,X,y)  +  V,  2)  = 

ip4{w,x,y)  +  ip4iw  +  x  +  y,v,z),  (14) 

for  all  VyW,x,y,z. 

This  formula  can  be  generalized  to  any  order.  It  is  always  a 
four  terms  identity. 

It  is  possible  to  deduce  all  the  unwrapped  trispectrum 
phases  from  the  (TV- 1)  modulo  2ir  trispectrum  phases  used 
in  a  multiresolution  recursive  algorithm  using  (14)[18]. 

The  trispectrum  phases  deduced  by  this  unwrapping  process 
are  compared  to  the  measured  modulo  2x  trispectrum 
phases  and  a  phase  of  2qTr  is  added  to  these  measured  val¬ 
ues  so  that  their  difference  will  be  less  than  x. 

There  are  other  approaches  for  performing  phase  unwrap¬ 
ping.  The  next  two  methods  are  also  efficient  in  practice. 

5.2.  Multiresolution  used  to  unwrap  the  trispec- 
trum  phase 

The  multiresolution  method  [17]  gives  a  first  ^proxima- 
tion  of  the  channel  phases  This  recursive  algorithm 

does  not  need  any  phase  unwrapping.  Those  values  are  then 
used  to  compute  all  the  modulo  Sx  trispectrum  phases  in  the 
interval  [— 4x,  47r[  using  equation  (4). 

53.  Multiresolution  combined  with  the  optimal 
method 

Such  a  combination  is  possible  thanks  to  the  recursive 
structure  of  the  multiresolution  method  [17].  The  multires¬ 
olution  will  be  combined  with  the  optimal  method  as  fol¬ 
lows  : 

-  at  step  n  of  the  algorithm,  the  multiresolution 

method  gives  a  first  estimate  of  )  for  m  = 

0,l,...,2"-i-  1. 

-  these  values,  and  those  calculated  in  the  previous  steps 

(at  lower  resolutions),  are  used  to  unwrap  the  trispectrum 
phases  :  ip4  P> 9>^  =  O'  •••>2"  —  1  us¬ 

ing  (4). 

-  Next,  the  LS  estimation  (7)  uses  these  trispectrum 

phases  to  give  an  improved  estimation  of  for 

m  =  0,l,...,2"-i-l. 

-  Finally,  these  last  estimates  will  be  used  to  initialize  the 
next  step  (n  -l- 1)  of  the  algorithm. 

6.  IHspectrum  estimation  enhancement 

The  main  drawback  of  the  projection  method  of  section  3 
is  due  to  the  difficulties  in  performing  correctly  phase  un¬ 
wrapping  when  the  trispectrum  estimation  is  very  noisy.  We 
propose  an  attempt  to  treat  this  problem  in  improving  this 
trispectrum  estimation.  We  look  for  the  trispectrum  phase 
V’/’ACTo(Wl,W2)  W3)  which 


•  satisfies  the  factorizability  condition  [10]: 

V’F>lCTo(wi,  W2,  W3)  —  '^FACToi^lf 

+  IpFACToi'^l  W2,  W3) 

—  V’FACTo(wi  4- W2 -l- W3,  a:,  j/)  (15) 

Vwi,w2,W3  foragiven(a;,  j/). 


•  and  minimizes: 


JV-l  N-l  N-1  ^ 

y~]  ^  [V’(wi,  W2,  W3)  —  V’FACTo(wi,  W2,«3)]  • 

0^1=0  0/2  =  0  tt^3  =  0  (16) 

Note  that  these  two  conditions  (eq.  15  and  eq.  16) 
need  only  be  satisfied  for  the  modulo  2;r  values  of 

V’FACTo(^i)^2)^3)  2nd 

The  following  recursion  gives  good  results  in  practice  : 
starting  fi’om  the  measured  trispectrum  phases,  we  compute 
recursively : 


(17) 


EE 

ar=0  y=0  ^ 


Moreover,  we  impose  that  satisfies  the 

trispectrum  symmetries  (cf.  section  2.2).  In  our  Simula- 
tions,  the  improved  trispectrum  was  obtained  after  about  5 
iterations. 


7.  SIMULATION  RESULTS 

The  channel  used  in  the  simulations  is  a  5th  order  filter. 
The  input  was  a  4-QAM IID  signal.  This  system  is  defined 
by; 

y[jfc]  =  a;[jfe]-|-0.1a:[fe-l]-1.87x[l:-2] 

-1-3.02  x[l:  -  3]  -  1.435  x[k  -  4]  -|-  0.49  x[k  -  5] 

Note  this  channel  has  two  zeros  close  the  unit  circle  (p  = 
0.99).  The  trispectrum  is  estimated  from  450  samples  of 
the  observed  signal  using  the  direct  method.  Figure 
5  shows  the  results  obtained  from  the  optimal  phase  recon¬ 
struction  algorithm  of  section  3  using  the  phase  unwrapping 
method  of  section  5.3.  This  poor  result  is  due  to  the  errors 
in  the  phase  unwrqtping  step.  Indeed,  the  trispectrum  esti¬ 
mation  computed  fiom  so  few  data  is  very  noisy.  In  figure 
6,  we  first  apply  the  trispectrum  enhancement  method  be¬ 
fore  the  phase  reconstruction  step.  This  result  illustrates  the 
advantage  of  using  such  an  enhancement  method.  Finally 
figure  7,  shows  the  mean  results  and  the  standard  deviations 
obtained  from  50  Monte  Carlo  runs  obtained  in  the  same 
conditions  than  the  results  of  figure  6.  As  the  simidation 
results  show,  this  method  yields  good  results,  even  when 
few  data  are  used  and  errors  on  the  trispectrum  phase  ex¬ 
ceeds  7r/4. 
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8.  CONCLUSIONS 

The  reconstruction  combining  phase  unwrapping  and 
projection  formula  (7)  gives  acceptable  results,  at  least 
when  phase  unwrapping  works  well.  However,  if  the  phase 
is  not  correctly  unwrapped,  the  quality  of  the  reconstruction 
reduces.  It  seems  that  in  this  case  the  research  of  an  opti¬ 
mal  solution  under  factorizability  constraint  as  proposed  in 
section  6  improves  the  results  significantly  even  when  the 
amount  of  data  is  less  than  one  thousand  signal  samples. 
This  seems  to  confirm  that  the  principal  difficulty  in  blind 
identification  based  on  higher  order  spectral  analysis  lies 
essentially  in  accurate  phase  unwrapping. 


Figure  5.  Impulse  response  reconstruction  using 
the  combination  method. 


Figure  6.  impulse  response  reconstruction  using 
first  the  trispectrum  enhancement  method,  and 
then  the  combination  method. 
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Abstract 

This  paper  proposes  a  scheme  for  maximum- 
likelihood-based  signal  reconstruction.  The  scheme 
extends  a  previous  work  by  Yellin  and  Friedlander 
to  the  case  of  fractionally-spaced  data.  The  use  of 
fractionally- spaced  data  obviates  the  need  for  timing- 
phase  recovery.  Batch  and  adaptive  algorithms  are  de¬ 
rived  and  illustrated  by  examples.  The  algorithms  are 
useful  for  equalization  of  digital  communication  chan¬ 
nels. 


1.  Introduction 

Yellin  and  Friedlander,  in  [1],  [2],  and  [3]  presented 
a  family  of  algorithms  for  multi-channel  blind  signal 
reconstruction  based  on  maximum  likelihood.  The  un¬ 
derlying  assumption  in  these  works  is  that  the  source 
signals  are  i.i.d.,  mutually  independent,  and  the  re¬ 
ceived  signals  are  sampled  at  the  symbol  rate.  In  this 
paper  we  extend  the  work  [1]  to  the  case  of  sampling 
at  higher  than  the  symbol  rate. 

The  main  advantage  of  sampling  at  higher  than  the 
symbol  rate  is  reduced  sensitivity  to  timing  phase  er¬ 
rors.  When  sampling  at  the  symbol  rate,  the  sampling 
instants  must  coincide  with  the  most  open  point  of  the 
eye.  Usually,  a  communication  channel  has  bandwidth 
larger  than  half  the  symbol  rate  (so-called  excess  band¬ 
width),  so  a  sampling  rate  equal  to  the  symbol  rate 
voilates  Nyquist’s  rule.  Elimination  (or  minimization) 
of  intersymbol  interference  then  depends  critically  on 

*This  work  was  supported  by  the  Office  of  Naval  Research 
under  Contract  No.  N00014-95- 1-0912  and  by  the  University  of 
California  MICRO  program  and  Applied  Signal  Technology,  Inc. 


accurate  knowledge  of  the  channel’s  impulse  (or  fre¬ 
quency)  response  and  on  accurate  timing.  On  the  other 
hand,  when  the  sampling  rate  is  increased  to  meet 
Nyquist’s  condition,  no  information  is  lost  in  the  sam¬ 
pling,  therefore  we  expect  to  be  able  to  reconstruct  the 
source  signal  more  reliably.  In  most  practical  cases,  the 
bandwidth  excess  is  less  than  100  percent,  so  doubling 
the  sampling  rate  (i.e.,  sampling  twice  per  symbol)  is 
sufficient. 

In  this  paper  we  consider  only  the  case  of  a  single 
source  and  a  single  receiver.  Although  the  probability 
distribution  of  the  source  is  not  restricted  in  principle, 
in  practice  we  are  interested  only  in  discrete  distribu¬ 
tions,  typical  of  communication  signals.  We  propose  a 
fractionally-spaced  equalizer  based  on  maximum  like¬ 
lihood.  The  equalizer’s  coefficients  are  estimated  di¬ 
rectly,  without  explicit  estimation  of  the  channel  re¬ 
sponse.  We  derive  both  batch  and  adaptive  versions  of 
the  algorithm.  Study  of  the  multi-source  multi-receiver 
case  (the  so-called  signal  separation  problem)  is  left  to 
a  future  paper. 

2.  The  Signal  Model 

We  use  the  variable  t  to  denote  continuous  time,  the 
integer  n  to  denote  the  time  index  of  signals  sampled 
at  the  symbol  rate,  and  the  integer  n'  to  denote  signals 
sampled  at  L  times  the  symbol  rate,  where  L  is  the 
oversampling  factor.  Parentheses  are  used  for  continu¬ 
ous  time  and  square  brackets  for  discrete  time.  Let  s[n] 
denote  the  transmitted  symbol  sequence,  T  the  symbol 
interval,  and  c{t)  the  continuous-time  response  of  the 
channel  to  a  discrete-time  unit  sample.  Both  s[n]  and 
c{t)  are  complex  valued.  The  sequence  s[n]  is  assumed 
to  be  zero-mean  i.i.d.  The  continuous-time  received 
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signal  is  given  by 


00 

y{t)=  s[n]c{t-to-nT),  (1) 

n=— 00 

where  to  is  an  unknown  delay.  Here  and  throughout 
the  derivation  we  assume  that  there  is  no  noise. 

The  discrete-time  (oversampled)  received  signal  is 
given  by 


oo 

y[n']  =  y{n'T/L)  =  Y  s[n]c{{n'  -  nL){T/L)  -  to) 

n~~oo 


oo 

=  ^  s[n]c[n' —  nl/],  (2) 

n=— oo 


where 

c[n']  =  c{n'{T/L)  -  to)  (3) 

is  the  equivalent  discrete- time  channel.  Note  that  the 
discrete-time  channel  depends  on  the  delay  to- 


for  any  function  /(•)  (where  u  denotes  conjugation  of 
u).  In  [1],  the  main  idea  is  to  force  the  condition  (7) 
to  hold,  at  least  approximately,  for  a  user-chosen  non¬ 
linear  function  /(•).  Different  functions  give  rise  to 
different  equalizers.  Choosing  f{u[n])  =  u[n]  is  equiv¬ 
alent  to  forcing  the  sequence  u[n]  to  be  uncorrelated, 
therefore  the  resulting  equalizer  belongs  to  the  fam¬ 
ily  of  equalizers  based  on  second-order  statistics.  As  is 
well  known,  such  equalizers  cannot,  in  general,  contend 
with  nonminimum-phase  channels.  This  explains  why 
/(•)  must  be  a  nonlinear  function.  In  particular,  if  s[n] 
has  a  continuous  and  differentiable  probability  density 
function  p{s)  and  we  choose 


f  dpjs)  dpjs)  \ 
V5SR{s}  ^dQ{s} )  ’ 


(8) 


then  the  solution  of  (7)  can  be  regarded  as  an  approx¬ 
imate  maximum  likelihood  estimation  of  the  equalizer 
coefficients  [1].  The  function  in  (8)  is  the  score 
function  of  p{s).  For  discrete  distributions,  a  particu¬ 
larly  attractive  choice  is 


3.  The  Fractionally-Spaced  Equalizer 


f{s)  =  S-  {s},  (9) 


Let  us  split  the  received  signal  y[n']  into  L  parallel 
subchannels: 

y^^^[n]=y[nL  +  l],  0<1<L-1.  (4) 

Let  the  equalizer  consist  of  L  “subequalizers”,  where 
the  Ith  subequalizer  has  coefficients  0  <  fc  < 

2K},  We  denote  by  6  the  set  of  real  parts  and  imag¬ 
inary  parts  of  all  (this  set  therefore  contains 

2{2K  -\~  1)L  elements).  The  outputs  of  the  subequaliz¬ 
ers  are  defined  as 

2K 

u^^^[n]  =  ^ —  k].  (5) 

k=0 

The  reconstructed  symbol  sequence  is,  by  definition, 

L-l 

u[n\  =  (6) 

1=0 

We  regard  the  signal  u[n]  as  an  estimate  of  s[n  —  K];  in 
other  words,  ideally  we  would  like  to  get  u[n]  =  s[n—K] 
after  convergence. 

If  the  equalizer  were  ideal,  u[n]  would  be  equal  to 
s[n],  therefore  it  would  be  independent  of  past  and  fu¬ 
ture  values  of  arbitrary  functions  of  w[n],  that  is,  we 
would  have  for  all  r  ^  0 

E{f{u[n])u[n  -  r]}  =  E{f{u[n])}E{u[n  -  r]}  =  0, 

(7) 


where  (s)  is  the  symbol  nearest  to  s  in  the  source  alpha¬ 
bet.  The  function  /(s)  in  (9)  is  an  approximation  of 
the  score  function  of  a  Gaussian  mixture  density,  where 
the  individual  Gaussians  are  centered  around  the  dis¬ 
crete  symbol  values  and  their  widths  approach  zero;  see 
[4].  In  practice,  it  reaches  the  best  asymptotic  relative 
efficiency  under  noise- free  conditions;  see  [2]. 

The  main  idea  in  this  work  is  to  replace  (7)  by  the 
stronger  condition 

E{f{u[n])u^^^[n  -  r]}  =  0, 

for  all  r  ^  0  and  0  <  ^  <  L  —  1.  (10) 

Our  premise  is  that,  if  we  can  force  (10)  to  hold,  at 
least  approximately,  then  (7)  will  hold  a  fortiori.  In 
practice,  we  will  consider  (10)  only  for  \r\  <  Q,  where 
Q  is  a  user-chosen  parameter  satisfying  Q  >  K,  We 
assume  that  enough  data  y[n']  are  available  to  generate 
f{u[n])  and  —  r]  in  the  range  1  <  n  <  AT  for  all 
—Q  ^  T  <  Q.  We  define 

H  =  [n  -  t] 

n=l 

-E{f(u[n])u^%]}6[T], 

0  ^  ^  ^  ^  —  Ij  ’^Q  ^  T  <  Q. 

(11) 

Our  aim  is  to  cause  the  [r]  to  be  as  close  to  zero  as 
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possible.  To  this  end,  we  will  introduce  a  cost  function 

l=zO  r——Q 


and  seek  to  minimize  V*  as  a  function  of  the  subequal¬ 
izers  coefficients 

Batch-type  algorithms  for  minimization  of  the  cost 
function  V  can  be  derived  using  standard  optimization 
techniques:  steepest  descent,  Newton-Gauss,  quasi- 
Newton,  and  so  on.  Such  optimization  methods  rely 
on  analytic  expressions  for  the  partial  derivatives  of 
the  cost  function  with  respect  to  the  parameters.  We 
now  derive  expressions  for  these  derivatives.  Differen¬ 
tiation  of  the  cost  function  with  respect  to  an  arbitrary 
parameter  9  gives 


-£/— 1  Q 

w  =  EE/ 

1=0  t=—Q 


ae  j  ’ 


(13) 


where 


ae  N  ae  ^  ' 

n—1 


+  f{u[n]) 


au^^^  \n  —  t] 


(14) 


In  the  complex  case,  f{u)  is  not  always  a  differen¬ 
tiable  (analytic)  function  of  u;  however,  when  it  is  not, 
it  is  usually  differentiable  with  respect  to  u  and  u  seper- 
ately.  For  example,  the  function  f{u)  —  |u|^  =  uu  is 
not  analytic  in  u,  but  we  have 


Then 


=  „<■»>  Mi 


(16a) 

(16b) 


and 

(17a) 
(17b) 

Substitute  (16)  and  (17)  in  (15)  to  get 


du\n\ 


y^^\n-k]. 


ah'r\k] 


ahi'^^k] 


ah^\k] 


;^^{5i(uW)y("*)[n-fc] 

n—1 

+  92{u[n])y^'^''{n  -  k]}u^'''^{n  -  t] 

+  f{u{n])y^^^[n-T-k]6[l-m], 

(18a) 

^Y^{9i{u[n])y^'^'^[n-k] 

n—1 

-  92{uln\)y^'^\n  -  k]u^^\n  -  t] 

—  f{u[n])y^‘^'’[n  —  t  —  fc]5[Z  —  m]. 

(18b) 


4.  An  Adaptive  Fractionally-Spaced 
Equalizer 


dfju)  _  _  afju)  _ 

au  au 


Denote  by  gi{u)  the  partial  derivative  of  f{u)  with  re¬ 
spect  to  u  and  by  52  (w)  its  partial  derivative  with  re- 
spect  to  u.  Then 


ae 


N 


du[n] 


92{u[n]) 


du[n] 


+  /(^N) 


[n  —  r] 


(15) 


An  adaptive  equalizer  enables  tracking  of  the  pa¬ 
rameters  of  a  time-varying  channel.  In  our  case,  since 
the  equivalent  discrete-time  channel  depends  on  the  de¬ 
lay  to,  an  adaptive  equalizer  also  enables  tracking  of  a 
variable  delay,  for  example  due  to  source  or  receiver 
motion.  The  adaptive  equalizer  we  derive  in  this  sec¬ 
tion  is  based  on  the  stochastic  gradient  approach.  More 
complex  algorithms,  based  on  stochastic  quasi-Newton, 
can  be  derived  but  are  not  discussed  here. 

We  assign  the  subscript  “old”  to  quantities  based 
on  data  up  to  time  n  -  1,  and  the  subscript  “new”  to 
quantities  based  on  data  up  to  time  n.  In  particular, 
we  define 

^niwM  =  +  f{u[n])u^‘'>[n  -  t] 

-  E{f{u[n])u^^'>{n]}6[T], 

0<1<L-1,  -Q<t<Q, 


Note  that,  for  /(•)  given  in  (9),  pi(-)  =  1  and  g2{-)  =  0. 
Let 

h^'^'>[k]  =  h^^\k]+jh^\k]. 
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(19) 


where  A  <  1  for  exponentially  decaying  memory,  and 
A  =  1  for  growing  memory.  Correspondingly, 

E  (20) 

1=0  T——Q 


The  partial  derivatives  of  VJ, 

«T/ 

““E  E 

/=0  t=-Q 


de 


new  axe  given  by 

r]- 


(21) 


An  assumption  commonly  made  in  deriving  adaptive 
algorithms  is  that  the  algorithm  has  already  converged 
to  a  stationary  point  at  time  n  —  1,  meaning  that 


We  also  assume  that  and  JJnew  are  sufficiently 
close  to  each  other,  so  that  (22)  implies 


dVoid 

de 


L-l  Q  ( 

E  E 

;=o  T——Q  L 


de 


I 


When  we  substitute  (23)  and  (19)  in  (21)  we  get 


de 


L-l  Q 

=  EE 

1=0 


(fli(wN) 


-  r]  +  /(u[n]) 


9u[n] 

'W 

du^^^  [n 


+  92{u[n]) 

Zd]}, 


du\n\ 

~W 


) 


(24) 


The  partial  derivatives  of  w[n]  and  —  r]  are  as 

given  in  (17)  and  (16),  respectively. 

The  parameter  vector  6  is  updated  using 

1 


^new  —  ^old  mM" 


de 


(25) 


where  ^[n]  =  ^o/'^  in  the  growing  memory  case  and 
/i[n]  =  ^0  in  the  decaying  memory  case  (in  either  case, 
jjio  is  a  user-chosen  parameter). 

In  summary,  the  adaptive  algorithm  performs  the 
following  computations  at  each  time  point: 

1.  Computation  of  0  <l  <  L—1}  and  u[n], 

using  (5)  and  (6),  respectively,  with  the  current 
values  of 

2.  Updating  the  using  (19). 

3.  Computation  of  the  new  instantaneous  gradient, 
using  (24),  (17)  and  (16). 

4.  Updating  the  equalizer  coefficients,  using  (25). 


5.  Examples 

We  illustrate  the  adaptive  equalizer  by  two  exam¬ 
ples.  The  purpose  of  these  examples  is  to  show  the 
main  advantage  of  using  a  fractionally-spaced  equal¬ 
ize,  namely:  obviating  the  need  for  timing  recovery. 
The  source  signal  in  both  examples  is  16-QAM.  The 
channel  in  has  a  raised-cosine  impulse  response  with  40 
percent  bandwidth  excess.  When  the  output  of  such  a 
channel  is  sampled  at  the  symbol  rate,  it  is  ISI-free  if 
the  sampling  instants  are  properly  timed.  In  the  first 
example,  the  oversampling  ratio  is  L  =  2  and  the  sam¬ 
pling  instants  are  delayed  by  0.125T  and  0.625T  with 
respect  to  the  ideal  timing  instant.  The  correspond¬ 
ing  coefficients  of  the  equivalent  discrete-time  channel 
(truncated  to  length  14)  are: 

c[n']  =  -  0.0063, 0.0088, 0.0469,  -0.0368,  -0.1593, 

0.1239, 0.7678, 0.9722, 0.4436,  -0.0891,  -0.1191, 
0.0270,0.0324,-0.0053. 

The  parameters  of  the  equalizer  were  chosen  as 

if  =  6,  Q  =  8,  /Jt  =  10-^,  A  =  0.999. 

The  initial  equalizer  coefficients  where  chosen  as 
/i(o)[if]  =  /i(i)[if]  =  0.5,  and  =  0  otherwise. 

Figure  1  shows  the  equalized  symbol  constellation 
during  four  stages  of  the  equalizer’s  operation.  Part 
a  shows  the  first  500  symbols;  part  b  shows  the  sym¬ 
bols  at  time  points  1501  through  2000;  part  c  shows 
the  symbols  at  time  points  4501  through  5000;  finally, 
part  d  shows  the  symbols  at  time  points  9501  through 
10000.  As  we  see,  the  equalizer  indeed  converges  from 
an  almost  closed-eye  initial  state  to  an  almost  open-eye 
final  state. 

The  second  example  compares  the  fractionally- 
spaced  equalizer  with  a  symbol-rate  equalizer  based 
on  the  same  optimization  criterion.  In  this  example 
we  added  white  noise  with  standard  deviation  0.2  (this 
represents  an  SNR  of  about  20  dB).  When  the  sam¬ 
pling  instants  coincide  with  the  most  open  point  of 
the  eye,  both  equalizers  perform  equally  well.  How¬ 
ever,  when  there  is  a  delay  of  0.25T,  the  symbol-rate 
equalizer  collapses.  This  is  shown  in  parts  a  and  b  of 
Figure  2.  Part  a  shows  the  first  500  symbols  at  the 
output  of  the  symbol-rate  equalizer  and  part  b  shows 
the  symbols  at  time  points  9501  through  10000.  As  we 
see,  the  symbol-rate  equalizer  does  not  converge  in  this 
case.  Parts  c  and  d  show  the  corresponding  symbols  at 
the  output  of  the  fractionally-spaced  equalizer,  when 
the  sampling  instants  are  0.25T  and  0.75r.  As  we  see, 
the  fractionally-spaced  equalizer  properly  converges  to 
an  open-eye  state. 
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Figure  1.  The  equalized  symbol  constellations 
in  the  first  Example:  (a)  time  points  1-500; 
(b)  time  points  1501-2000;  (c)  time  points 
4501-5000;  (d)  time  points  9501-10000. 
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Figure  2.  The  equalized  symbol  cbnstella- 
tions  in  the  second  Example:  (a)  symbol- 
rate  equalizer,  time  points  1-500;  (b)  symbol- 
rate  equalizer,  time  points  9501-10000;  (c) 
fractionally-spaced  equalizer,  time  points  1- 
500;  (d)  fractionally-spaced  equalizer,  time 
points  9501-10000. 
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Abstract 

A  unified  class  of  inverse  filter  criteria  using  two  cu- 
mulantSj  which  includes  Wiggins^  criterion,  Donoho^s 
criteria  and  TugnaiVs  criteria  as  special  cases,  has  been 
proposed  by  Chi  and  Wu  for  blind  deconvolution  and 
equalization  of  real  single-input  single- output  (SISO) 
linear  time-invariant  (LTI)  systems.  In  this  paper,  we 
extend  this  class  of  (single  channel)  inverse  filter  crite¬ 
ria  to  a  family  of  multistage  and  a  family  of  single  stage 
criteria  for  deconvolution  and  equalization  of  real  (or 
complex)  multi-input  multi-output  (MIMO)  LTI  sys¬ 
tems  with  only  non- Gaussian  measurements.  It  can  be 
shown  that  the  two  families  of  inverse  filter  criteria  lead 
to  perfect  equalization  for  MIMO  systems  under  some 
conditions.  Some  simulation  results  for  the  optimum 
inverse  filter  using  gradient  type  iterative  optimization 
algorithms  were  provided  to  support  the  proposed  crite¬ 
ria.  Finally,  we  draw  some  conclusions. 

1.  Introduction 

Multichannel  blind  deconvolution  and  equalization 
is  a  problem  to  estimate  a  desired  signal  u(n)  = 
[ui{n),  ...,Up{n)]^  with  only  a  set  of  measurements 
x(n)  =  [xi(n), . . .  ,Xq{n)Y'  given  by  the  following  con¬ 
volutional  model: 

oo 

x(n)  =  H(n)  *  u(n)  =  ^  H(fc)u(n  -  k)  (1) 

•  k=—oo 

where  H(n)  is  the  q  x  p  impulse  response  matrix 
sequence  of  a  p-input  ^-output  linear  time-invariant 
(LTI)  system.  The  problem  has  recently  drawn  ex¬ 
tensive  attention  in  wireless  communications,  such 

*This  work  is  supported  by  the  National  Science  Council  un¬ 
der  Grant  NSC  86-2213-E-007-037. 


as  mobile  communications  with  asynchronous  direct- 
sequence  code-division  multiple  access  (DS-CDMA), 
data  communications  over  dually  polarized  multipath 
channel,  and  array  signal  processing  for  base-station 
with  spatial  division  multiple  access  (SDMA). 

Higher-order  statistics  (HOS)  and  inverse  filters 
have  been  used  for  multichannel  blind  deconvolution 
and  equalization  [1-5].  Inouye  and  Sato  [2]  extended 
Shalvi  and  Weinstein’s  (single  channel)  inverse  filter 
criterion  [6]  to  both  multistage  (MS)  and  single  stage 
(SS)  multichannel  inverse  filter  criteria.  Tugnait  [3] 
also  extended  the  single  channel  inverse  filter  criteria 
reported  in  [7]  to  MS  multichannel  inverse  filter  cri¬ 
teria.  These  criteria  use  only  second-  and  third-  or 
fourth-order  cumulants  of  signals.  On  the  other  hand, 
Chi  and  Wu  [8]  proposed  a  class  of  single  channel  in¬ 
verse  filter  criteria  Jr,m  using  an  rth-order  (even)  and 
an  mth-order  (>  r)  cumulants  of  real  data.  This  class 
includes  Wiggins’  criterion  (associated  with  J2,4)  [9], 
Donoho’s  criteria  (associated  with  J2,m)  [10],  and  Tug¬ 
nait ’s  (single  channel)  criteria  J2,3}  J2A  and  [7]  as 
special  cases.  Note  that  Shalvi  and  Weinstein’s  criteria 
[6]  are  actually  the  same  as  Donoho’s  criteria  for  real 
data.  In  this  paper,  we  extend  this  class  of  inverse  fil¬ 
ter  criteria  Jr,m  to  a  family  of  MS  and  a  family  of  SS 
inverse  filter  criteria  for  real  (or  complex)  multi-input 
multi-output  (MIMO)  LTI  systems. 

2.  Model  assumptions  and  problem  for¬ 
mulation 

Assume  that  x(n),  n  =  0,  ...,iV  —  1  are  a  given 
set  of  real  (or  complex)  non-Gaussian  measurements 
generated  from  (1)  with  the  following  assumptions: 

(Al)  The  components  Ui{n),  i  =  l,...,p,  of  the  in¬ 
put  vector  u(n)  are  real  (or  complex),  zero-mean. 
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i.i.d.  non-Gaussian,  and  all  of  them  are  statisti¬ 
cally  independent  of  each  other. 

(A2)  The  unknown  p-input  g-output  LTI  system 

n{z)=  £  H(n)z-"  (2) 

n=— oo 

is  real  (or  coniplex)  stable  with  possibly  nonmin- 
imum  phase. 

(A3)  q>p. 

(A4)  The  MIMO  LTI  system  n{z)  is  of  full  rank  on 
the  unit  circle,  i.e.,  ranklHiz)}  =  p  for  \z\  =  1. 

As  mentioned  in  [3],  the  assumptions  (A3)  and  (A4) 
are  a  set  of  sufficient  conditions  for  the  existence  of  the 

stable  inverse  filter  of  Hiz). 

Assume  that  V{z)  is  a  stable  px  5  MIMO  LTI  system 
and  V(n)  is  the  impulse  response  matrix  sequence  of 
V(2).  Let 

e(n)  =  [ei(n.), . . .  ,ep(n)] 

=  V(n)  *  x(n)  =  G(n)  *  u(n)  (3) 

where  ,  . 

G(n)  =  V(n)*H(n)  (see  (1))  (4) 

is  the  p  X  p  impulse  response  matrix  sequence  of  the 
combined  overall  system,  denoted  G{z)- 

With  V{z)  as  an  estimate  for  ?{(z)’s  inverse  system, 
the  goal  of  multichannel  deconvolution  and  equaliza¬ 
tion  is  to  find  an  optimum  V{z)  such  that 

g{z)  =  V(z)  •  n{z)  =  P  •  V{z)  (5) 

(perfect  equalization)  where  P  is  a  (nonsingular)  per¬ 
mutation  matrix  and  ^{z)  is  a  p  x  p  diagonal  matrix 
given  by 

V{z)  =  diag(aiz-^^ ,  •  •  • ,  Qp^"^”)  (6) 

in  which  ai,  i  =  1, .  •  •  ,P  are  unknown  real  (or  complex) 
scale  factors  and  n,  i  =  1, •  •  •  ,P  are  unknown  time 
delays.  Consequently,  it  can  be  easily  observed  from 
(3)  and  (5)  that  the  optimum  equalized  signal 

e(n)  =  [oiiUii  (n  —  Til), ... ,  aij,Ui^{n  —  Tj^)]  (7) 

where  {fi,...,ip}  is  a  permuted  sequence  of  the  se¬ 
quence  {1, . . .  ,p}  associated  with  P. 

3.  Inverse  filter  criteria  for  MIMO  LTI 
systems 

For  ease  of  later  use,  let  /lij(n),  Vy(n)  and  gij{n) 
denote  the  elements  of  the  matrices  H(n),  V(n) 


and  G(n),  respectively.  Moreover,  let 
Cf: ,,  =  CUM(ei(n), . . . ,  ei{n),el{n),  (n))  (8) 
/i  terms  h  terms 

denote  the  (/i  -1-  /2)th-order  cumulant  of  the  ith  equal¬ 
ized  signal  (i  €  {1, . .  •  ,p}) 

ej(n)  =  gain)  *  ui(n)  + - h  flip(n)  *  Up{n)  (9) 

where  the  superscript  **’  denotes  complex  conjugation. 

The  new  multichannel  inverse  filter  criteria  to  be 
presented  below  are  based  on  the  following  theorem: 

Theorem  1.  Let  the  {h  +  /2)th-order  cumulant  of 
Ui{n)  be  where  i  =  1, . . .  ,p.  Assume  that  all  the 
(2s)th-order  cumulants  of  Ui(n),  i  =  1, ...  ,P,  have  the 
same  sign,  i.e., 

sign{7“U  =  signi'rZ)  =  ’ ' '  =  sign{7.?J  (10) 

Then  under  (Al)  through  (A4)  and  li  +  l2>  2s>  2, 


Ji  = 


\2s 


A 


(11) 


<  Kmax  =  max  {kj,  J  =  1, .  •  •  ,p}  (12) 


where 


The  equality  of  (12)  holds  if  and  only  if 


(13) 


gij{n)  =  ai6(n  -  Ti)S{j  -  jo),  jo  €{!,...  ,p}  (14) 

where  at  is  an  unknown  real  (or  complex)  scale  factor, 
Tj  is  an  unknown  time  delay,  and  jo  is  the  index  asso¬ 
ciated  with  the  maximum  value  of  Hj,  j  =  1, . . .  ,p  (see 
(12)).  □ 


Note  that  the  results  presented  in  Theorem  1  for 
{s,li,l2)  =  (1>2,1)  and  (1,2,2)  have  been  proposed  by 
Tugnait  [3],  while  those  for  other  choices  of  is,h,l2) 
such  as  (1,3,1),  (1,3,2),  ...  are  new.  Next,  based  on 
Theorem  1,  two  families  of  multichannel  inverse  filter 
criteria,  which  follows,  in  part,  the  ideas  proposed  by 
Inouye  and  Sato  [2],  are  presented  for  finding  the  opti¬ 
mum  inverse  filter  V(ti). 


A.  Family  of  MS  Criteria 

Let  Vi(n)'^  denote  the  ith  row  vector  of  V(n).  The 
family  of  MS  criteria,  wMch,  at  ith  stage,  try  to  find 
the  optimum  Vi(n)  using  Ji  with  some  uncorrelatedness 
constraints  on  the  obtained  (i  - 1)  inverse  filter  output 
processes  ei(n),  ...,  ej_i(n),  is  as  follows; 

Stage  1:  Estimation  of  vi(n). 
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Maximize 


where  Ji  is  given  by  (11)  and  li+l2>2s>  2, 

Stage  i:  Estimation  of  Vi(n)  for  i>2. 

Maximize 


jn=Ji-XiY: 


k-l 


6fc  I 


(16) 


where  h  + 12  >  2s  >  2,  Aj  is  a  positive  real  con¬ 
stant,  Z  is  the  set  of  all  integers,  and 


s  terms 


Qr(T)=CUM(ei(n),...,ei(n), 

eifc(^^-T),...,efc(n-r))  (17) 


s  terms 


The  deconvolution  and  equalization  capabilities  of 
the  proposed  MS  criteria  are  supported  by  the  follow¬ 
ing  theorem: 

Theorem  2.  The  MS  criteria  given  by  (15)  and  (16) 
lead  to  a  solution  for  Q{z)  that  satisfies  (5)  (perfect 
equalization),  provided  that  given  by  (16)  are  chosen 
such  that  Xi  >  /Cmax  (see  (12)). 


B.  Family  of  SS  Criteria 

Again,  with  some  uncorrelatedness  constraints  on 
the  inverse  filter  output  processes,  the  family  of  SS  cri¬ 
teria  simultaneously  estimates  vi(n),  V2(n),  ...,  Vp(n) 
by  maximizing 


(18) 


where  h  +  ^2  >  2s  >  2  and  A  is  a  positive  real  con¬ 
stant,  and  meanwhile  their  deconvolution  and  equal¬ 
ization  capabilities  are  supported  by  the  following  the¬ 
orem. 

Theorem  3.  Under  the  assumption  that  all  /Ci,  i  =  1, 
2,  ,  p  (see  (13)),  are  the  same,  the  SS  criteria 
lead  to  a  solution  for  Q{z)  that  satisfies  (5)  (perfect 
equalization). 


(R2)  With  (s,Zi,/2)  =  (1)2,1)  or  (1,2,2),  Tugnait’s 
MS  approach  [3]  obtains  the  equalized  signal  ei{n) 
as  the  estimate  of  an  input  signal  Uj{n)  by  un¬ 
constrained  maximization  of  Ji  (i.e.,  without  any 
uncorrelatedness  constraints  on  the  inverse  filter 
output  processes)  at  the  ith  stage.  However,  this 
approach  has  to  estimate  the  channel  impulse  re¬ 
sponses  hij{n)y  Z  =  1, . . .  ,g,  for  each  stage  i  fol¬ 
lowed  by  removing  the  contribution  of  Uj{n)  from 
the  measurements,  leading  to  an  MIMO  system 
with  q  outputs  and  {p  -  i)  inputs  for  the  next 
stage.  On  the  other  hand,  besides  many  new 
choices  for  (s,Zi,Z2),  the  optimum  inverse  filter 
matrix  V  (n)  is  directly  estimated  by  both  of  the 
proposed  MS  and  SS  criteria  without  need  of  es¬ 
timation  of  the  channel  impulse  responses  for  sys¬ 
tem  dimension  reduction. 

4.  Optimization  algorithms  for  MS  and 
SS  criteria 


To  find  the  optimum  inverse  filter  V(n)  using  the 
proposed  MS  and  SS  criteria  given  by  (15),  (16)  and 
(18)  with  a  given  set  of  data,  the  cumulants  used  in 
these  criteria  can  be  simply  replaced  by  the  associated 
sample  cumulants,  and  a  causal  FIR  filter  of  order  L 
can  be  used  as  an  approximation  to  V  (n).  Since  the  ob¬ 
jective  function  given  by  (15)  and  (16)  for  stage 

Z  is  a  highly  nonlinear  function  of 

Vt  =  ['t'il(O))  •  •  *  )  {Lf)y  •  •  •  )  ^igi(0)5  •  •  *  )  (19) 

and  the  objective  function  given  by  (18)  is  also  a 
highly  nonlinear  function  of 


v  =  [v?,vj,...,vjp  (20) 

We  use  gradient  type  iterative  optimization  algorithms 
to  find  the  optimum  Vi.  Specifically,  at  the  kth  itera¬ 
tion,  v*  (associated  with  MS  criteria)  is  updated  by 


Vi[A;]  =  Vi[k  -  1]  H-  p  • 
Vi[A:]  =Vi[fc]/||vi[A:]|| 


Ivj=v4*-1] 


(21) 


Two  worthy  remarks  regarding  the  proposed  MS  and 
SS  criteria  are  given  as  follows: 

(Rl)  Theorems  2  and  3  are  counterparts  to  the  ones 
proposed  by  Inouye  and  Sato  [2]  for  their  MS  and 
SS  criteria,  respectively,  while  the  latter  further 
requires  a  quite  restrictive  condition  that  7^*^  =  1 
for  alH. 


where  p  is  a  positive  real  constant.  Note  that  the  sec¬ 
ond  line  of  (21)  (i.e.,  normalization  for  Vi[/;:])  is  due  to 
the  fact  that  is  invariant  to  any  scaled  version  of 
Vi  (see  (14)).  For  SS  criteria,  v  given  by  (20)  is  also 
updated  in  a  similar  way  except  that  is  replaced 
by  and  normalization  operation  is  performed  for 
each  Vi  in  v  (see  (20)). 
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5.  Simulation  results 


In  this  section,  let  us  present  two  simulation  exam¬ 
ples  to  demonstrate  the  proposed  multichannel  inverse 
filter  criteria.  In  the  two  examples,  synthetic  noisy 
data  x(n)  for  SNR  =  20  dB  were  generated  from  (1), 
to  which  q  uncorrelated  white  Gaussian  noises  were 
added. 

Example  1.  (Real  signals) 

Let  us  consider  a  real  2-input  2-output  {i.e.,  p  = 
g  =  2)  MA(6)  system  n{z)  (taken  from  [3])  given  by 

Hn{z)  =  0.6455  -  0.32272”^ -t- 0.6445z"^ 

-0.3227z-^ 

Huiz)  =  0.6140  +  0.3684z-^ 

7^21  (z)  =  0.3873z“‘  +  0.8391z-^  -h  0.3227z“^ 
7^22(2)  =  -0.2579z"^  -  0.6140z"^  +  0.8442z“® 
-j-0.4421z“^  -f-  0.2579Z'® 

The  driving  inputs  Wi(n),  i  =  1,2,  were  assumed  to 
be  real,  zero-mean,  Exponentially  i.i.d.  with  "yj  \  =  1, 
7^1  =  2,  72^2  =  6,  and  73“^  =  24,  i  =  1, 2.  The  MS  cri¬ 
teria  for  {Sjhfh)  —  (1,2,1),  (1,2,2)  and  (1,3,2)  were 
used  to  obtain  the  inverse  filters  Vj,  i  =  1,2,  with  filter 
order  L  =  14  and  A2  =  Kmax  (i  e.,  X2  =  4,  36  and  576 
for  (sjuh)  =  (1,2,1),  (1,  2,  2)  and  (1,  3,  2),  respec¬ 
tively)  as  required  by  Theorem  2.  Thirty  independent 
runs  for  N  =  2048  and  4096  were  performed. 

A  performance  index  MISI  defined  as  (taken  from 

[1]) 

A  SfeiEn  \9ij{n)?]  -  max{|gi,(n)p,Vj,n} 
MISI  -  ^  max{lgij(n)|2,Vj,n} 


,  A  fy..  {gain)?]  -  max{|g,j(n)p,  Vi,n} 

2-^  max{|gy(n)P,Vi,n} 

_  (22) 

was  used  as  a  measure  of  the  multichannel  intersymbol 
interference  after  equalization.  Note  that  MISI  =  0 
when  g{z)  satisfies  (5).  Table  1  shows  average  values  of 
MISI’s,  denoted  <  MISI  >,  which  were  calculated  with 
the  obtained  thirty  independent  estimates  of  Ptj(n). 
One  can  see  that  all  the  MISI’s  after  equalization  are 
smaller  than  the  MISI  before  equalization  (the  top  row 
of  Table  1)  with  the  best  MISI  improvement  (around 
14  dB)  for  {s,h,l2)  =  (1,2,  !)• 


Example  2.  (Complex  signals) 

A  real  2-input  2-output  nonminimum-phase  MA(2) 


system 

nz)  = 


l-0.3z-i-f-0.8z-2  -0.92z  1 

z-2  l-0.5z-i-t-0.2z-2 


whose  zeros  are  0.5946±jl.0738  and  — 0.1946±j0.2614, 
was  used.  The  input  ui(n)  was  assumed  to  be  a  8- 
PSK  signal  of  unity  variance  and  the  other  input  U2(n) 
was  a  16-QAM  signal  of  unity  variance.  The  MS  cri¬ 
teria  for  (sjuh)  =  (1,2,2)  were  used  to  obtain  the 
inverse  filters  Vj,  i  —  1,2,  with  filter  order  L  =  20  and 
A2  =  Kmax  =  1  as  required  by  Theorem  2.  A  single 
realization  was  performed  for  data  length  N  =  4096. 

The  MISI  before  equalization  is  7.8460  dB  and  the 
MISI  after  equalization  is  -13.6245  dB  for  this  exam¬ 
ple,  i.e.,  the  use  of  the  proposed  MS  criteria  led  to 
around  21  dB  improvement  in  MISI.  Moreover,  Fig¬ 
ures  1(a)  and  1(b)  show  the  unequalized  signal  constel¬ 
lations  (i.e.,  eye  patterns)  associated  with  a;i(n)  and 
X2{n),  respectively,  for  n  =  0  ~  4095.  Figures  1(c)  and 
1(d)  show  the  equalized  signal  constellations  associated 
with  ei(n)  and  62  (n),  respectively,  for  n  =  0  ~  4095. 
One  can  see  from  these  figures  that  the  eye  patterns 
after  equalization  are  open  to  a  sufficient  degree. 

6.  Conclusions 

Based  on  Theorem  1  and  Inouye  and  Sato’s  ideas, 
we  have  extended  Chi  and  Wu’s  single  channel  inverse 
filter  criteria  to  a  family  of  MS  and  a  family  of  SS  cri¬ 
teria  for  blind  deconvolution  and  equalization  of  real 
(or  complex)  MIMO  LTI  systems.  Furthermore,  we 
proved  (Theorems  2  and  3)  that  under  some  condi¬ 
tions,  both  of  the  proposed  MS  and  SS  criteria  lead 
to  perfect  equalization,  and  their  efficacy  was  justified 
through  computer  simulations. 
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Table  1.  Average  values  of  MISI's  over  thirty 
independent  runs  for  SNR  =  20  dB. 


Initial  MISI  =  9.8293  dB  (before  equalization) 

(sjiyh) 

<  MISI  >  (in  dB) 

N  =  2048 

N  =  4096 

(1,2,1) 

-4.1371 

-4.7832 

(1,2,2) 

0.4316 

-2.4964 

(1,3,2)  ^ 

5.8262 

2.3185 

Figure  1.  (a)  and  (b)  Unequalized  signal  constellations  associated  with  xi(n)  and  X2{n),  respectively,  for 
n  =  0  ~  4095  and  SNR  =  20  dB;  (c)  and  (d)  equalized  signal  constellations  associated  with  ei(n)  and 
e2(n),  respectively,  for  n  =  0  ~  4095  using  the  MS  criteria  with  {s,li,l2)  =  (1)2, 2). 
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Abstract 


Blind  deconvolution  and  blind  equalization  have  been 
important  interesting  topics  in  diverse  fields  including  data 
communication,  image  processing  and  geophysical  data 
processing.  Recently,  Inouye  and  Habe  proposed  a  con¬ 
strained  multistage  criterion  for  attaining  blind  deconvo¬ 
lution  of  multichannel  linear  time-invariant  (LTI)  systems 
[2].  In  this  paper,  based  on  their  constrained  criterion,  we 
present  an  iterative  algorithm  for  solving  the  blind  decon¬ 
volution  problem  of  multichannel  LTI  systems.  Inouye  and 
Sato  proposed  new  unconstrained  criteria  for  accomplish¬ 
ing  the  blind  deconvolution  of  multichannel  LTI  systems  [3]. 
Based  on  their  unconstrained  criteria,  we  show  iterative  al¬ 
gorithms  for  solving  the  blind  deconvolution  of  multichannel 
LTI  systems.  Simulation  examples  are  included  to  examine 
the  proposed  algorithms. 


1  Introduction 

Blind  deconvolution  and  blind  equalization  have  been 
important  interesting  topics  in  diverse  fields  including  data 
communication,  image  processing  and  geophysical  data 
processing.  Recently,  Inouye  and  Habe  proposed  a  con¬ 
strained  multistage  criterion  for  attaining  blind  deconvo¬ 
lution  of  multichannel  linear  time-invariant  (LTI)  systems 
[2].  In  this  paper,  based  on  their  constrained  criterion,  we 
present  an  iterative  algorithm  for  solving  the  blind  decon¬ 
volution  problem  of  multichannel  LTI  systems.  Inouye  and 
Sato  proposed  a  new  unconstrained  multistage  criterion  for 
accomplishing  the  blind  deconvolution  of  multichannel  LTI 
systems  [3].  Under  the  assumption  that  all  the  magnitudes 
of  the  fourth-order  auto-cumulants  of  components  of  the 
input  vector  process  are  identical,  Inouye  and  Sato  also  pre¬ 
sented  a  new  unconstrained  single-stage  maximization  cri¬ 


terion  for  attaining  the  blind  deconvolution  of  multichannel 
LTI  systems  [3].  Based  on  their  unconstrained  criteria,  we 
show  iterative  algorithms  for  solving  the  blind  deconvolu¬ 
tion  of  multichannel  LTI  systems.  Simulation  examples  are 
included  to  examine  the  proposed  algorithms. 

In  this  paper,  we  use  the  same  notation  as  in  [2],  [3]. 

2  Problem  Formulation 


Let  us  consider  the  system  shown  Fig.  1 ,  It  is  a  cas¬ 
cade  connection  of  an  unknown  multichannel  system  H 
and  a  multichannel  equalizer  W.  We  make  the  following 
assumptions  on  the  system  and  the  signals  involved. 

(Al)  The  unknown  system  H{z)  is  described  by 

oo 

y{t)=  H{k)uit-k)  (1) 

k=-~oo 

where  y{t)  is  a  real/complex  n-column  output  vector,  u{t) 
is  a  real^omplex  n-column  input  vector,  and  {JEr(fc)}  is 
a  real/complex  n  x  n  matrix  sequence  called  the  impulse 
response.  The  system  is  stable,  that  is,  the  impulse  response 
satisfies  the  absolute  summability  condition 

OO 

^  ||iT(fc)||<oo.  (2) 

fe=-oo 


u(k) 


H(z) 


Unknown  system 


A 


W(z) 


z(k). 


Equalizer 


G(z) 


Figure  1.  Unknown  system  and  equalizer 
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(A2)  The  transfer  function  defined  by 

H{z):=  f;  H{k)z-^  (3) 

k=—cx> 

is  of  full  rank  on  the  unit  circle  |z|  =  1  (this  implies  it  has 
no  zero  on  the  unit  circle). 

(A3)  The  input  process  {u{t)}  is  a  zero-mean,  non- 
Gaussian  vector  process,  whose  component  processes 
{wt(t)},  i  =  1,** *,71,  are  mutually  independent.  More¬ 
over,  each  component  process  {«i(f)}  is  an  independently 
and  identically  distributed  (i.i.d.)  process  with  non-zero 
variance  ^  0  and  non-zero  fourth-order  auto-cumulant 
^  0- 

(A4)  The  equalizer  W  (z)  is  described  by 

oo 

z(t)=  Yl  W{k)y{t-k)  (4) 

oo 

where  z{t)  is  a  real/complex  n-column  vector,  called  the 
equalizer  output,  and  {W{k)}  is  a  real/complex  n  x  n 
matrix  sequence.  It  is  assumed  that  the  equalizer  W  is  also 
stable. 

There  are  inherent  ambiguities  in  the  solution  to  the  mul¬ 
tichannel  deconvolution  problem  as  follows:  In  general,  we 
cannot  identify  the  order  of  the  arrangement  of  the  compo¬ 
nents  ui  (t),  •  •  • ,  Wn(t)  of  input  vector  u(t),  the  time  origin 
of  each  component  Ui{t),  and  the  magnitude  of  each  com¬ 
ponent  Ui{t), 

Taking  these  ambiguities  into  account,  the  multichannel 
blind  deconvolution  problem  is  formulated  such  that  it  is  to 
find  an  equalizer  W  so  that  the  transfer  function  G{z)  of 
the  combined  system  takes  the  form  of 

G{z)  =  PA{z)D  (5) 

where  P  is  a  permutation  matrix,  A{z)  is  a  diagonal  matrix 
with  diagonal  entries  Xii{z)  =  =  1,  •  *  • , n  (where  k 

is  an  integer),  and  JD  is  a  constant  diagonal  matrix.  More¬ 
over,  if  we  know  all  the  magnitudes  of  the  variances  of  the 
components  of  the  input  process  ahead,  we  can  constrain  to 
make  the  diagonal  matrix  JD  in  (5)  be  equal  to  a  diagonal 
matrix  with  the  diagonal  entries  all  being  unit  magnitude. 

We  require  the  following  notions  for  stationary  random 
vector  processes  in  this  paper.  It  is  said  that  a  stationary 
random  process  {u{t)}  satisfies  the  normalized  condition 
if  the  variance  of  each  component  of  the  vector  process 
{u(t)}  is  equal  to  unit.  It  is  said  to  satisfy  the  normal¬ 
ized  whitening  condition  if  all  the  component  processes 
=  l,-“,n,  of  {u(f)}  are  white  random  processes 
with  unit  variance  and  they  are  mutually  uncorrelated. 

3  Blind  Deconvolution 

To  begin  with,  let  us  assume  that  the  input  process  {n(t) } 
satisfies  the  normalized  condition  by  dividing  each  compo¬ 
nent  {ui{t)}  by  the  square  root  of  variance  crj.  to  eliminate 


the  magnitude  ambiguity. 

First  we  consider  the  following  multistage  maximization 
criterion  (A): 

(Stage  1)  Maximize  \k4^zi  |  subject  to  =  1. 

(Stage  k)  Maximize  \^A^Zk\  subject  to  =  1  and 
(t)  =  0  for  all  r  e  Z,  and  all  i  =  1, 2,  *  *  * ,  A;  —  1. 
Here  k  moves  successively  from  2  to  n. 

Then  we  obtain  the  following  theorem. 

Theorem  I [2] :  Under  the  normalized  condition  of  the 
input  process,  the  multistage  maximization  criterion  (A) 
gives  a  solution  to  the  multichannel  blind  deconvolution 
problem. 

4  Iterative  Algorithms 

Based  on  the  multistage  maximization  criterion  (A),  we 
develop  an  iterative  algorithm  for  multichannel  blind  de- 
convolution  by  the  gradient  or  steepest-descent  method.  At 
Stage  k  in  the  criterion  (A),  the  maximization  of  each  cri¬ 
terion  function  should  be  subjected  to  the  following  first 
constraint  or  the  first  and  second  constraints: 

(Cl)  ctI  =  I, 

(CZ)  rz^,zr{r)  =0,  t  e  Z,  i=\,--,k-\. 

The  first  constraint  is  the  same  as  the  one  treated  in  the 
single-channel  case  [1],  but  the  second  one  appears  to  be 
new  in  the  multichannel  case.  In  order  to  interpret  these  con¬ 
straints  into  constraints  for  the  coefficients  (tap-coefficients) 
of  the  equalizer,  we  require  a  whitening  operation  for  the 
output  process  {y{t)}  ahead.  This  is  also  the  case  for  single¬ 
channel  blind  deconvolution  [1].  Let  {yC^)}  be  a  whitened 
process  of  Such  a  filter  exists,  but  it  is  not  unique. 

Let  us  denote  the  transfer  function  of  an  equalizer  for  the 
cascade  system  of  the  unknown  system  and  the  whitening 
filter  by  C{z).  We  make  the  following  assumption  on  the 
equalizer  C. 

(AS)  The  equalizer  C  is  assumed  to  be  a  causal  FIR 
system  of  sufficient  length,  so  that  the  truncation  effects  can 
be  ignored. 

Then  we  can  represent  the  transfer  function  C{z)  = 
[c*i(^)]*,j=i  by 

U-1 

(^)  =  (0^“^  k,j=\,---,n  (6) 

1=0 

where  {ckj  (/):/=!,'*•,  denotes  the  sequence  of 

the  coefficients  (or  tap-coefficients)  of  equalizer  C, 

Now  we  can  represent  the  constraints  of  (Cl)  and  (C2)  by 
the  tap-coefficients  of  equalizer  C.  Since  {y{t)}  satisfies 
the  normalized  whitening  condition,  it  follows  from  (A5) 
along  with  (6)  that 

=C2,z^,zl(0)  =  \\Ckf 
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(7) 


where 

c,=  [cr„---,cL]’’GC'"'‘  (8) 

Ckj  =  [Cifej  (0) ,  •  •  • ,  Ckj  (ifc  -  1 e  C'''= .  (9) 

This  mean  the  first  condition  (Cl)  is  equivalent  to 

||Cfcl|^=l-  00) 

Similarly,  it  follows  from  (AS)  along  with  (6)  that 

/fc  — 1 

'’'zk,z^  "I - ^  g’^cj'„(f)cfc„(f)}; 

t=o 

reZ,  i  =  l,---,fc-l,  (11) 

where  q  is  the  forward  shift  operator  defining  by  qc*j  (t)  = 
c*  {t  +  I),  and  is  the  backward  shift  operator  defined 

byq~'c*ij{t)  =  d‘ij{t-l). 

Let  Cj  denote  the  column  vector  consisting  of  the  coeffi¬ 
cients  {cij(O),  •  •  -  1)}7=,  defined  in  the  same  way 

as  the  column  vector  given  by  (8)  and  (9).  In  order  to  treat 
the  above  relations  in  (1 1)  on  a  unitary  space,  we  embed  the 
vectors  C| ,  •  •  • ,  Ck-\  into  the  unitary  space  (7"**  containing 
Cfc.  This  means  that  {/*}”=,  is  an  increasing  sequence  and 
that  we  reset  Cj  as 

02) 

Ci,=  [c«(0).-.c«(I,-l)]’'€C'>  (13) 

where  cy  (r)  =  0  for  r  >  k.  Then  it  follows  from  (11) 

rzu,z;{-T)  =  {q'"ci)”ck  =<  Ck^q'Ci  >,  (14) 

where  <,  >  denotes  the  inner  product  on  space  C"**',  and 
q^Ci  is  defined  by 

q^c,=  [q^cl,-,q^clf  (15) 

q^Cij  =  [g’’cij(0),---,g’‘cy((fc  -  1)]^  €  (7*^  (16) 

Since  Cy  (t)  =  0  for  t  <  0  or  t  >  /j,  we  have  q^ Ci  =  0  for 
r  <  -Ik  or  T  >  k.  Therefore,  (14)  means  that  the  second 
condition  (C2)  is  equivalent  to 

<Ck,q'^Ci>=  0; 

-{lk-^)<r<li-Ui  =  h---,k-\-  (17) 

Let  ruk  be  the  number  of  the  constraints  in  (17)  defined  by 


fc-i 

rrik  =  +  (i  -  1) 

i=l 


(18) 


and  Bk  be  an  nlk  x  m*  dimensional  matrix  defined  by 
=  e  (19) 

gc„c„«-'c’,  •  •  •  e 

We  decompose  the  unitary  space  C7”'*'  into  two  orthogonal 
subspaces  as 

C7"'‘  =  ImBfc  ©  (ImB*)-*-,  (21) 


where  ImBk  denotes  the  image  space  of  matrix  Bk  and 
(ImBifc)-'-  denotes  the  complementary  orthogonal  subspace 
of  ImBfc  in  C'"'*’.  Then  it  follows  from  (14)  and  (17)  that 
the  second  condition  (C2)  is  equivalent  to 

Ck  e  (ImB*)-^.  (22) 

In  order  that  there  is  non-trivial  solution  c*.  0  of  (22),  it 

holds  nlk  >  ruk,  provided  that  all  the  column  vectors  of  B* 
are  linearly  independent.  This  means  along  with  (18)  that 
the  sequence  {/,}"=!  is  an  increasing  sequence  satisfying 


lk>Yl(h-^),  k=\,---,n.  (23) 

i=\ 

We  now  describe  an  iterative  procedure  for  solving  the 
constrained  maximization  problems  in  the  multistage  crite¬ 
rion  (A)  by  the  gradient  method.  The  procedure  for  finding 
a  solution  which  attains  the  maximum  of  the  criterion  func¬ 
tion  of  Stage  1  subject  to  (Cl)  with  fc  =  1  is  just  the  same 
as  Shalvi  and  Weinstein  [1],  and  thus  each  iteration  of  the 
procedure  at  Stage  1  consists  of  the  following  two  steps: 

c\=Ci  +  H]Vc,F], 


where  c,  is  the  vector  of  the  first-row  tap-coefficients  (de¬ 
fined  by  (8)  and  (9)  with  k  =  \)  before  the  iteration,  c\'  is 
vector  of  the  tap-coefficients  after  iteration,  i  is  a  positive 
constant  that  regulates  the  step-size,  and  Vc,i^i  is  the  gra¬ 
dient  of  the  criterion  function  F]  of  Stage  1  with  respect  to 


Cl. 

At  Stage  k  (where  2  <  fc  <  n)  in  addition  to  (Cl) 
we  should  take  account  of  second  condition  (C2)  which  is 
equivalent  to  Ck  G  (ImBfc)"*".  Let  P k  denote  the  matrix  of 
the  projection  onto  the  subspace  ImB*,  along  (ImB*.)-*-,  and 
denote  the  matrix  of  the  projection  onto  the  subspace 
(ImB*)-^  along  ImBfe.  Then  any  vector  c*  in  C'"'*’  has  an 
orthogonal  decomposition  given  by 


Ck 


,(2) 


(26) 


where 

=  PfeCfc  and  =  P*  Cfc.  (27) 

It  is  proved  that 

Vc^2>Fk  =  PiVc,Fk.  (28) 

It  follows  from  (27)  and  (28) 

cf  ^ VgO) Pfe =Pifc-Cfe -l-//fcPfc  VcjPfc 

=  Pi{ck+fik^CtPk},  (29) 


where  fik  >  0. 

Based  on  the  preceding  discussions,  each  iteration  of  the 
procedure  for  finding  a  point  which  attains  the  maximum  of 
the  criterion  function  Fk  of  Stage  k  subject  to  (Cl)  and  (C2) 
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(where  2  <  fc  <  n)  is  made  of  the  following  two  steps: 


(30) 

(31) 


where  cjt  is  the  vector  before  the  iteration,  and  is  the 
vector  after  the  iteration. 


The  Stochastic  Gradient  Algorithm 

The  algorithm  goes  on  recursively  in  time  t.  Assume 
the  coefficients  of  the  equalizer  have  been  computed  at 
time  t  -  \.  Then  they  are  updated  at  time  t  from  the  data 
-  ^)\3  ~  1,***5^;0  <  r  <  max(/j)}  by  the 
following  steps: 


Step  7.  Define  c]^  by 

^k  —  [cJu  5  *  *  *  J  ^*k'n\  ^ 

Step  8.  Compute  cj!  by 

4  =  Pi  4- 

Step  9.  Define  (t)  ;  r  =  0,  •  •  • ,  4  -  1  ,  by 

[4i{%--Ai{h-\)f  =  4i 

\r."T  ... 


Step  1.  Compute  {4j{t);  t  =  0,  •  ■  ■ ,  Z,  -  by 

c'ljC-r)  =  cij(r)  +/ii  sgn/t4,^, 

X  [ki  -r)-  (zi  {t)-)z*it)yj*{t  -  r)] 


c\'j{r)  = 


c'uir) 


\/i:;=.E';=oK,(T)p"" 

=  (1  —  —  1)")  +  fI,g^Zi{ty 


where  fii  is  a  positive  constant  that  regulates  the  step-size, 
and  jUs,  is  a  positive  constant  that  regulates  the  step-size 
used  for  estimating  E{z\  Then  set 


Step  10.  Compute  {cjfej(T);T  =  0,  -  ••  ,4  -  1}!?^,  by 


Ckjir)  = 


\/EJ.,  Et’o' 


Step  11.  If  A;  <  n,  set  fc  =  fc  +  1  and  go  to  Step  3.  If  fc  =  n, 
the  go  to  next  time  ^  +  1 .  To  go  to  the  next  time,  we  set  t  = 
f+1  and  collect  new  data  {2j(f-fl),t/j(i-f-l);i  =  l,***,n} 
from  each  output,  where  zj  (i  + 1 ) ’s  are  the  equalizer  outputs 
(at  t+ 1 )  computed  using  the  new  coefficients  of  the  equalizer 
updated  at  lime  t.  Then  go  back  to  Step  1. 


cijM  =  c",(t)  for  j  r  =  0,---,ii  -  1. 

Step  2.  Set  k  =  2. 

Step  3.  Define  {ci}^~^  by 

Cij  =  [c8j(0), •  •  •  ~  I)]  €  C^'‘ 

where  Cij(r)  =  0  for  T>li. 

Step  4.  Define  {Ci}^~l  by 

Step  5.  Compute  the  projection  matrix  presenting  the 
projection  onto  the  complementary  orthogonal  subspace  of 
the  image  space  of  matrix  [Ci, •  •  • , (This  compu¬ 
tation  may  be  carried  by  finding  a  SVD  (singular  value 
decomposition)  of  [Ci,  •  •  • ,  Cfe_i]). 

Step  6.  Compute  {c'^^Cr);  r  =  0,  1}^^,  by 

4j  (■r)  =  Cfcj  (t)  -1-  fik  sgnM.zi 
X  [\zkit)\‘^Zk{t)y]{t  -  t)  -  {zk{tY)zl{t)y*{t  -  r)] 
{Zk(t)~)  =  (1  -IZ,J(Zk{t~-  ]f)  +  IJ,e„Zk{tf 

where  is  a  positive  constant  that  regulates  the  step-size, 
and  is  a  positive  constant  that  regulates  the  step-size  for 
estimating  E{zk{t)^}, 


5  The  Unconstrained  Criterion 

It  is  generally  more  difficult  to  solve  a  maximization 
problem  with  constraints  than  to  solve  a  constraint-free  max¬ 
imization  problem  equivalent  to  the  original  one.  In  the 
sequel,  we  develop  constraint-free  criteria  for  solving  the 
multichannel  blind  deconvolution. 

Let  us  assume  that  we  know  all  the  magnitudes  of  the 
fourth-order  auto-cumulants  of  the  components  of  the  vector 
process  ahead  and  that  they  satisfies  the  following  decreasing 
sequence  condition 

|7i|  >  I72I  >  •••  >  |7n|  (32) 

where  ji  :=  K4^ui  l^or  e  ~  1 ,  •  •  • ,  n.  Consider  the  following 
potential  function  [1]  defined  by 

Mzi)  ■■=  \K4,Zi  I  +  \7i\fi4i)  (33) 

where  /(•)  is  a  continuous  real-valued  function  over  [0,  oo) 
such  that 

p(x)  :=  +  f{x)  (34) 

monotonically  increases  in  0  <  a;  <  1,  monotonically  de¬ 
creases  for  x  >  1 ,  and  has  a  unique  maximum  at  x  =  1 .  Such  a 
function,forexample,isgivenby 'p(x)  =  2ax-ax“,a  >  0. 

Corresponding  to  the  multistage  maximization  criterion 
(A),  we  consider  the  following  unconstrained  multistage 
maximization  criterion  (B): 
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(35) 


(Stage  1):  Maximize 

Ji  :=  |/t4,zi  I  l7il/(*^zi) 

(Stage  k):  Maximize 
Jk  :=  |«4,ztl  +  l7fc|/(<^z»)  “  ■^o(y^  )  (36) 

i=l tZZ 

where  Ao  is  a  positive  constant  greater  than  [71 1,  Le.,  Ao  > 
l7il- 

Based  on  Theorem  1,  we  have  the  following  theorem. 

Theorem  2[3}  :Vn6eT  the  normalized  whitening  condi¬ 
tion  of  the  input  process  {tt(f)},  the  unconstrained  mul¬ 
tistage  maximization  criterion  (B)  gives  a  solution  to  the 
multichannel  deconvolution  problem. 

Proof:  The  proof  is  omitted. 

Under  the  assumption  that  all  the  magnitudes  of  the 
fourth-order  auto-cumulants  of  the  components  of  the  in¬ 
put  vector  process  are  identical,  Inouye  and  Sato  also  pre¬ 
sented  a  new  imconstrained  single-stage  maximization  cri¬ 
terion  for  attaining  the  blind  deconvolution  of  multichannel 
LTI  systems[3].  Based  on  their  unconstrained  criteria,  we 
can  show  also  on-line  (or  stochastic  gradient)  algorithms  for 
solving  the  blind  deconvolution  of  multichannel  LTI  sys¬ 
tems.  However,  the  algorithms  are  omitted  for  a  page  limit. 

6  Simulation  Results 


Figure  2.  Performances  of  the  three  on-line 
algorithms. 


the  multichannel  intersymbol  interference  denoted  by  Misi, 
which  was  introduced  in  [2]  and  [3].  The  initial  Misi  in  log¬ 
arithmic  (dB)  scale  was  5.0803  dB  for  the  three  algorithms. 
The  three  algorithms  were  also  examined  in  10  indepen¬ 
dent  Monte  Carlo  runs  using  3,000  data  sample  points  at  the 
channel  outputs  for  each  Monte  Carlo  run. 

We  note  that  both  the  algorithm  (B)  and  the  algorithm 
(C)  need  not  performing  prewhitening  of  the  channel  out¬ 
puts  even  if  this  is  not  the  case,  but  the  gradients  of  the 
criterion  functions  in  the  algorithm  (B)  and  the  algorithm 
(C)  involve  more  terms  than  those  of  the  criterion  functions 
in  the  algorithm  (A).  In  Fig.  2,  we  plotted  the  average  Mjsi, 
denoted  by  (Misi),  over  10  independent  Monte  Carlo  runs. 
It  can  be  seen  from  Fig.  2  that  the  algorithm  (C)  exhibits 
the  best  performance  of  all  the  algorithms. 


In  order  to  see  the  performance  of  the  proposed  algo¬ 
rithms,  we  examined  the  on-line  (or  stochastic  gradient) 
algorithm  (A)  based  on  the  constrained  multistage  maxi¬ 
mization  criterion,  the  on-line  algorithm  (B)  based  on  the 
imconstrained  multistage  maximization  criterion,  and  the 
on-line  algorithm  (C)  based  on  the  unconstrained  single- 
stage  maximization  criterion. 

We  considered  the  following  scenario:  Since  the  algo¬ 
rithm  (A)  require  (multichannel)  spectral  prewhitening  of 
the  output  process  of  the  unknown  system  as  in  [1],  we  chose 
an  all-pass  LTI  system  for  the  unknown  system,  which  is  a 
2-input  and  2-output  system  described  by 


H{z)  = 


V  0 


0 

0.2+z-' 

i+o;2z-' 


(37) 


Hence  we  need  not  to  paform  a  whitening  operation  for 
the  output  process  ahead  in  applying  the  algorithm  (A),  and 
thus  we  used  the  same  symbol  W  as  an  equalizer  in  apply¬ 
ing  the  three  algorithms.  The  channel  input  signals  ui  (f) 
and  U2{t)  were  data  sequences  from  mutually  independent 
4-PSK  (phase-shift  keying)  sources  with  unit  variance.  We 
used  a  2-input  and  2-output  equalizer  W.  The  length  h 
and  the  length  h  of  equalizer  W  were  chosen  to  be  12  and 
24,  respectively.  As  a  measure  of  performance  we  used 


7  Conclusions 

We  proposed  the  on-line  algorithm  (A)  based  on  the  con¬ 
strained  multistage  criterion.  Then  we  presented  the  on¬ 
line  algorithm  (B)  based  on  the  unconstrained  multistage 
criterion  and  the  on-line  algorithm  (C)  based  on  the  un¬ 
constrained  single-stage  criterion.  Simulation  results  have 
shown  illustrate  the  performance  of  the  three  algorithms, 
(A),(B),and(C). 
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ABSTRACT 

A  number  of  fourth-order  HOS  approaches  are  reviewed 
for  applicability  to  RF  propagation  in  wideband 
application  in  urban  environments.  A  lattice  geometry  is 
used  to  model  constraints  in  the  propagation  which 
constrain  the  mean  direction  of  arrival.  Errors  in 
estimation  of  this  angle,  and  methods  of  increasing  the 
signal  level  can  potentially  be  mitigated  using  blind 
deconvolution  with  a  distributed  set  of  sources.  With  a 
moving  receiver,  a  time  average  can  also  potentially 
improve  these  estimates.  Data  results  show  some  of  these 
modeled  propagation  effects,  which  have  dramatic 
angular  changes  with  slight  frequency  shifts. 

1.  Introduction 

In  current  digital  personal  cellular  mobile  communica¬ 
tions  systems  (PCS)  and  wireless  digital  local-area 
networks  (WLAN)  used  in  urban  environments,  the  effects 
of  multiple  propagation  paths  (multipaths)  received  at  the 
same  point  can  cause  significant  signal-loss  effects  and 
phase  distortion  of  the  transmitted  data.  These  effects  for 
the  higher-frequency  bands  (e.g.,  902-928  MHz,  2.5  GHz) 
can  have  large  link  margin  losses  over  short  ranges  (e.g., 
100  dB).  Wideband  operation  for  digital  transmission  in 
excess  of  1  Mbps  requires  a  few  MHz  of  bandwidth  and 
signal-to-noise  ratios  (SNR)  in  excess  of  10  dB  to  achieve 
tolerable  bit-error  rates  (i.e.,  BER  <  10'^).  The  effects  of 
multipath  fading  and  local  phase  distortion  can  cause  SNR 
losses  of  over  20  dB  and  long  time  delay  spreads,  which 
combine  to  severely  limit  PCS/WLAN  applications  in 
urban  environments.  An  approach  to  mitigating  multipath 
effects  using  higher-order  statistics  (HOS)  is  presented  and 
related  to  past  theoretical  work  and  recent  data  measure¬ 
ments. 


Simplistic  Representation 

One  can  view  an  urban  propagation  environment  as  a 
collection  of  free-pathways  and  building-blocked  pathways 
on  a  rectangular  grid.  This  lattice  structure  can  be 
analogous  to  a  solid-state  crystal,  with  a  unit  cell  having  a 
“diatomic”  cubic  structure  consisting  of  an  “L”  shaped 
free-path,  and  a  square  shaped  solid  path  as  shown  in 
Figure  1.  A  transmitter  source  (TX)  propagates  at  a 
wavelength  A,  between  these  unit  cells,  of  length  2a,  and 
cell  location  (k,  1),  with  Snell’s  Law  reflection  from  the 
solid  squares.  The  received  propagation  at  a  singular  point 
(RX)  consists  of  direct,  line-of-sight  (LOS)  propagation, 
and  N,  multiple  scattered  paths,  from  1  to  m(e.g.,  7) 
reflections.  A  few  example  pathways  are  shown  in  Figure 
1,  with  a  mean  received  pathway  direction  of  arrival 
(DOA),  >- ,  being  different  from  the  direct  path. 

If  the  datalink  transmitted  signal  is  represented  as  a 
complex  modulated  function  s  (t),  at  frequency  fo,  then  the 
received  signal  r  (t),  at  range  R,  is  just  a  phase  delayed 
LOS  direct  path  signal,  with  amplitude  reduced  by  a  as: 

r(t)  =  as(t  -  T)  (1) 

For  a  multipath  environment,  each  m*  path  contributes 

to  the  received  multipath  signal,  Tmft),  which  can  be 
represented  by  a  sum  of  signals  as: 

rm(t)  =  ft  Hi„(t)s(t  -  -m(t))  (2) 

m  1 

with  amplitude  distortion  ara(t)  and  time  delay  Tm(t)  being 
time-dependent  functions  from  changes  in  the  RX/TX 
location,  or  from  changes  in  the  propagation  path  from 
other  motion.  Generally  in  open  fields,  N  is  only  2  from  a 
single  ground  reflection,  but  in  urban  areas,  N  can  be  large. 
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The  analogy  of  the  lattice  propagation  in  cubic  solid- 
state  crystals[l]  is  analyzed  by  the  atomic  oscillation 
modes  being  either  longitudinal  or  transverse  to  the 
direction  of  travel.  Here  the  diatomic  cell  has  two  atomic 
masses  (i.e.,  Mi,  M2)  with  force  constant  C,  and  each 
propagation  mode  has  two  vibrations  for  a  given 
wavelength  at  frequency  f  (f  =  cA,  where  c  is  the  speed  of 
propagation).  Nearest  neighbor  interaction  (such  as  in  a 
single  reflection  point  per  unit  cell  of  Figure  1)  simplifies 
the  equation  of  motion  to  two  traveling  wave  solutions, 
with  ®  frequency  dependence  (to  =  2nf)  for  wave  number 

k=2it/X,  and  the  first  Brillouin  zone  is  at  kmax=  ±  Ji/2a.  At 

2 

kmax,  the  two  propagation  modes  are  limited  to  to  = 
2C/M2  (optic)  and  to^  =  2C/Mi  (acoustic)  for  Mi  >  M2. 
These  two  wave  number  solutions  are  analogous  to  the 
reflection  of  Figure  1  residing  with  one  corridor  (i.e., 
example  m  =  2)  or  between  two  corridors  (i.e.,  example  m 
=  4),  and  have  two  distinct  dispersion  relationships  (to  vs. 
k  in  Figure  17  of  Ref.  1). 

A  ray-tracing  propagation  example  was  performed  in 
simulation  of  an  urban  section  of  Rosslyn,  Virginia[2], 
modeled  with  a  direct  path  and  multipaths  down  to  100  dB 
in  reduced  SNR.  The  average  path  delay  spread  was  16.2 
nsec  with  an  average  of  N  =  6.3  path  components.  This 
lattice  model  of  restricted  DOA  was  supported  by  a 
histogram  of  319  location  measurements  along  one  street 
of  the  simulation  having  dominant  peaks  at  ±90  (Figure  2 


of  Ref.  2).  Thus,  referring  back  to  Equation  2,  it  is  possible 
to  generalize  the  sununation  for  large  N  to  a  convolution, 
as: 

T^(t)  n  ^H(t)s(t  - -(t))  Y 

for  time  periods  T  » t. 

If  one  considers  the  motion  of  a  receiver  within  the 
urban  lattice  geometry  model  as  a  collection  of  omni¬ 
receivers  superimposed  as  an  array  receiver  during  the 
averaging  time  T,  then  one  can  make  an  analogy  to  other 
models  of  DOA.  Merge  and  Wong[3]  examine  the  DOA 
for  a  linear  array  of  four  elements  at  X/2  spacing  (e.g.,  X  = 
4a),  and  solve  for  the  eigenvalues  of  a  fourth-order 

cumulant  matrix  (Ck).  These  values  are  limited  to  a  signal 
subspace  of  dimension  2,  with  an  identical  separation  with 
signal  spread  (DOA)  to  the  dispersion  of  the  lattice  model 
in  Ref.  1  (see  Figure  1  of  Ref.  3).  This  supports  the 
analogy  of  using  a  convolution  model  for  urban 
propagation,  which  has  a  dispersion  limitation  on  the  DOA 
propagation  in  averaging  times  performed  over  a  long  time 
relative  to  the  propagation  delays. 

The  key  concept  in  using  HOS  is  to  deconvolve  the 
effects  of  multipath  over  short  motion  times  as  follows. 
The  inverse  Fourier  transform  (-FFT)  of  the  product  of  the 
transformed  functions  (FFT)  is: 
rm(t)  =  [Rm(f)]'^^^  .  withRm(f)  ^S(f)A(f)  (4),  (5) 
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where  S(f)  =  [s(t)]*''''^e'2^<>-,  and  (6) 

A(f)  =  Ht)e  i2^o-]FFT 

The  deconvolution  or  equalization  involves  a  complex 
division  of  the  estimate  of  A  (f)  as  l/{A(f)),  with 
estimation  (  ),  and  the  inversion  of  Equation  (3),  by 

solving  for  s  (t  - 1)  as: 

s(t  -  _)  =  [R„(f)/M(f)»]-PfT  (8) 

Since  the  estimate  of  (  A  (f))  can  be  noise  contaminated, 
this  deconvolution  using  second-order  estimates  can  have 
singularities.  However,  since  s  (t)  in  general  is  non- 
Gaussian,  a  HOS  approach  to  estimating  the  deconvolution 
can  be  free  of  Gaussian  noise  contamination,  if  done  in 
fourth-order. 

Paper  Outline 

The  paper  extends  this  propagation  analogy  in  the  next 
section  into  the  various  publications  on  deconvolution 
using  HOS,  with  particular  focus  on  the  fourth  order 
formulations.  In  most  multipath  applications  the  received 
pathway  is  unknown  or  blind.  A  summary  of  these  results 
is  used  to  gain  insight  on  DOA  data  measurements  made  in 
urban  environments,  which  show  dispersive  effects  with 
motion  and  change  in  frequency.  These  results  are 
summarized  in  a  conclusion  section  to  argue  for  further 
theoretical  work  in  fourth-order  blind  deconvolution  for 
removing  the  multipath  effects  in  wideband  PCS  and 
WLAN  propagation  for  enhanced  SNR  and  higher  digital 
transmission  rates. 

2.  Background 

Typical  HOS  blind  deconvolution  techniques  have  used 
higher-order  spectra  and  cepstra  for  two  receivers  and  a 
Gaussian  model  for  the  convolution  noise[4].  The 
estimation  of  the  kernel  involves  a  nonlinear 
transformation  and  mean-square  error  (MSE)  technique.  A 
number  of  fourth-order  techniques  are  reviewed. 

Fourth-Order  HOS 

Fourth-order  HOS  cumulant  (HOC)  techniques [5]  have 
reduced  simulated  MSE  signal  error  by  up  to  30  dB  and 
preserved  the  phase  for  10^  estimation  iterations.  Such 
iterative  techniques  may  not  be  possible  in  long  coherence- 
length,  direct-sequence  spread-spectrum  (DSS) 
PCS/WLAN  modulation  techniques  for  large  numbers  of 
code  division  multiple  access  (CDMA)  users.  However, 
recent  fourth-order  techniques  in  data-link  applications 
have  predicted  success  in  larger  sample  sizes[6].  The  use 


of  multiple  receiver  data  sets  with  the  Kolmogorov 
theoreom[7]  suggests  the  use  of  a  neural  network  (i.e.,  an 
extended  Kalman  filter)  with  a  moving,  single  receiver  can 
synthesize  the  single  source  separation  in  the  DOA 
problem  for  a  two  source  simulation,  which  in  turn  reduces 
the  phase  error  of  the  deconvolution. 

Since  the  digital  datalinks  under  consideration  using 
DSS  modulation  are  non-Gaussian,  the  recent  fourth-order 
HOS  approaches[8,9]  were  shown  to  be  superior  in  DOA 
applications  with  correlated  Gaussian  noises,  and  have 
been  extended  to  spatially  distributed  signals[3].  Other 
work[10]  has  shown  superiority  of  fourth-order  estimation 
in  digital  communication  applications  for  SNR  <  12  dB  in 
blind  deconvolution  under  noisy  conditions.  Also, 
multipath  application  of  fourth-order  HOS  has  shown  very 
long  filter  deconvolutions  for  SNR  =  10  dB[ll],  and  many 
independent  signal  points. 

Multipath  Urban  Propagation 

Theoretical  multipath  propagation  in  urban 
environments  has  shown  agreement  with  data  collection 
using  a  free-space/conducting-slab  model  in  3D[12],  with 
SNR  range  roll-off  to  powers  of  R"  at  n  =  -2.6  to  -4.4. 
Similar  data  were  collected  in  urban  environments  by  CDS 
at  a  number  of  locations  and  frequencies.  These  losses  had 
a  n  =  -3  roll-off  in  the  urban  “lattice”,  with  an  additional 
loss  of  8-50  dB  inside  buildings.  Narrow  band,  indoor 
sweeps  of  frequencies  in  the  226  MHz  and  910  MHz  bands 
showed  multipath  fades  of  up  to  30  dB  over  bandwidths  of 
30  MHz.  This  means  that  improved,  indoor  PCSAVLAN 
applications  require  the  estimation  of  a  deconvolution 
kernel  for  moving  receivers  in  urban  environments,  over 
short  distance  and  in  short  times,  and  under  low  SNR.  The 
next  section  highlights  the  changes  in  the  DOA  of  these 
received  signals. 

3.  Data  Results 

Figure  2  shows  an  approximated  lattice  geometry  for 
the  actual  buildings  used  in  the  CDS  data  measurements. 
Calibration  and  sweep  location  sites  are  shown  for  the 
different  multi-storied  buildings.  Figure  3  shows  a 
1.2  MHz  narrow  band  sweep  in  power  (3a)  and  DOA  (3b), 
with  each  point  having  a  mean  ( (|)  as  solid)  and  standard 
deviation  value  (open)  averaged  over  many  samples  in  a 
fixed  geometry.  The  true  LOS  DOA  is  over  45®  in  error 
from  these  mean  measurements,  with  a  few  degrees  of 
fluctuation  error.  This  dispersion  in  DOA  with  frequency 
and  reduced  SNR  (dBm  level)  clearly  shows  correlation. 
Figure  4  is  the  same  data  measurement,  but  with  motion 
near  the  receiver  of  human  bodies  in  the  room.  Even  these 
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Figure  2.  Urban  Measurement  Environment  for  DOA  Sweeps 
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Figure  3.  RF  Sweep  Building  U  -  No  Movement 
2nd  Story  A  (Frequency  907.4  -  908.6  MHz  F.S.; 
LOS  DOA  =  145') 


Figure  4.  RF  Sweep  Building  U  -  Movement  - 
2nd  Story  A  (Frequency  907.4  -  908.6  MHz  F.S. 
LOS  DOA  =  145') 


slight  reflections  disturb  the  measurement  by  a  factor  of  5 
in  greater  estimation  noise.  This  multipath  reflection 
noise  cannot  be  considered  to  be  stochastic  over  short 
times,  but  places  a  new  constraint  on  DOA  estimation 
using  HOS  — stochastic  motion  noise.  Similar  results 
were  measured  in  other  buildings,  with  nonlinear  and 
sometimes  abrupt  changes  in  DOA  with  slight  frequency 
changes.  This  supports  a  lattice  model  effect  in 
constraining  urban  RP  propagation  to  switch  between 
constrained  pathways,  with  slight  changes  in  wavelength. 

4.  Conclusion 

An  analysis  of  urban  geometries  and  lattice  wave 
propagation  models  has  shown  constrained  propagation 
dependence  with  frequency,  particularly  in  the  DOA 
measurements.  Recent  fourth-order  theoretical  and 
simulated-data  tests  with  HOS  algorithms,  show  promise 
in  blind  deconvolution  approaches,  which  could 
potentially  improve  SNR  and  DOA  estimation.  However, 
multipath  measurements  show  that  slight  changes  in  the 
mean  DOA  ((|))  estimation  occur  from  motion  of 
reflecting/blocking  humans,  and  would  be  more  severe 
from  solid  object  motion.  This  example  can  be  viewed  as 
a  distributed  set  of  receiver  measurements,  if  time 
averaging  estimation  is  used  in  the  blind  deconvolution 
approach.  These  results  point  to  a  need  for  new 
theoretical  algorithm  development  for  improving 
PCS/WLAN  technology. 
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Abstract 

Pulse-echo  reflection  techniques  are  used  for 
ultrasonic  flaw  detection  in  most  commercial  instruments. 
As  the  measured  pulse  echo  signal  is  assumed  to  be  the 
result  of  linearly  convolving  the  defect  impulse  response 
(IR)  with  the  measurement  system  response,  the  objective 
is  thus,  to  remove  the  effect  of  the  measurement  system 
through  a  deconvolution  operation  and  extract  the  defect 
impulse  response.  The  major  drawbacks  of  conventional 
second-order  statistics  (SOS)-based  deconvolution 
techniques  are  their  inability  to  identify  non-minimum 
phase  systems,  and  their  sensitivity  to  additive  Gaussian 
noise.  Our  contribution  is  to  show  that  higher-order 
statistics  (HOS)-based  deconvolution  techniques  are  more 
suitable  to  unravel  the  effects  of  the  measurement  systems 
and  the  additive  Gaussian  noise.  Synthetic  as  well  as  real 
ultrasonic  signals  are  used  to  support  this  claim. 

1.  Introduction 

Pulse-echo  reflection  techniques  are  used  for  ultrasonic 
flaw  detection  in  most  commercial  instruments  [1].  The 
ultrasonic  wave,  generated  by  a  piezoelectric  transducer 
coupled  to  the  test  specimen,  propagates  through  the 
material  and  part  of  its  energy  is  reflected  if  the  wave 
encounters  an  inhomogeneity  or  discontinuity  in  its  path. 


while  the  remainder  is  reflected  by  the  back  surface  of 
the  test  specimen.  A  typical  oscilloscope  display  is 
shown  in  Fig.  1.  The  first  wavelet  represents  the  initial 
voltage  applied  to  the  transducer  in  order  to  generate  the 
wave,  while  the  successive  echoes  represent  the  voltage 
generated  by  the  reflected  wave  (from  the  flaw  and  the 
back  echo  respectively)  impinging  on  the  transducer. 
The  flaw  echo  in  Fig.  1  contains  information  regarding 
the  material  discontinuity  that  the  ultrasonic  wave  has 
encountered  in  its  path.  For  this,  signal  processing  is 
used  on  the  flaw  echo  only,  and  the  other  echoes  are 
discarded  from  subsequent  signal  display. 


Ultrasonic  Signal 


Fig.  1 :  Typical  oscilloscope  display  of  an  ultrasonic 
examination. 
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Flaw  echo  signals  are  masked  by  the  characteristics  of  the 
measuring  instruments,  the  propagation  paths  taken  by  the 
ultrasonic  wave,  and  are  corrupted  by  additive  noise.  It  is 
assumed  that  the  measured  flaw  echo  is  obtained  by 
linearly  convolving  the  flaw  or  the  defect  impulse 
response  with  the  measurement  system  response. 
Deconvolution  operation  therefore,  seeks  to  undo  the 
effect  of  the  convolution  and  extract  the  defect  impulse 
response  which  is  essential  for  defect  identification. 

Conventional  deconvolution  techniques  (CDT)  such  as 
least  square,  Wiener  filter,  and  minimum  variance 
deconvolution  [2]  are  based  on  a  priori  knowledge  of 
second-order  statistics  (SOS)  of  the  noise  and  the  input 
signal.  In  practice  however,  the  acoustic  noise  due  to 
scattering  from  the  grains  inside  the  propagation  medium 
does  not  have  a  readily  known  statistic  [3].  Moreover, 
ultrasonic  pulse  echoes  are  found  to  be  non-minimum 
phase  systems.  SOS-based  deconvolution  techniques, 
being  phase-blind  cannot  therefore,  accurately  estimate 
the  defect  impulse  response. 

The  objective  of  this  paper  is  to  formulate  the  defect 
ultrasonic  model  in  the  polyspectrum  domain  where  the 
processing  is  more  suitable  to  unravel  the  effect  of  the 
measurement  system  and  the  additive  Gaussian  noise. 
Thereafter,  the  defect  impulse  response  is  recovered  from 
its  noise-free  polyspectrum.  Synthesized,  as  well  as  real 
ultrasonic  signals  are  used  to  show  that  the  proposed 
technique  excels  conventional  SOS-based  deconvolution 
techniques  commonly  used  in  NDT. 

2.  Theory 

A  measured  ultrasonic  flaw  signal,  y(t),  can  be 
modeled  as  the  convolution  of  the  measurement  system 
response  function,  x(t),  with  the  flaw’s  impulse  response 
function,  h(t),  plus  noise,  N(t).  This  model  can  be  written 
as 

y{t)^x{t)®Kt)  +  N{t)  (1) 

where  0  denotes  the  convolution  operation.  With  this 
model,  defect  of  a  particular  geometry  would  be 
completely  characterized  by  its  impulse  response. 
Estimation  of  h(t)  in  (1),  from  the  knowledge  oiy(t),  and 
x(t)  is  variously  known  as  system  identification,  filtering, 
or  simply  as  deconvolution.  Many  deconvolution 
techniques  have  been  developed  in  different  engineering 
areas  such  as  seismic  exploration,  military  applications, 
and  medical  imaging.  Chen  [2]  has  studied  the  feasible 
applications  of  these  deconvolution  techniques  to 
ultrasonic  NDT,  and  has  concluded  that  Wiener  filter  is  a 
good  candidate  for  such  application.  The  main  drawbacks 
of  CDT  are  their  inability  to  identify  non-minimum  phase 
systems,  and  their  optimal  implementation  requires  a 
priori  knowledge  of  the  noise  statistics.  These  drawbacks 


can  be  completely  alleviated  when  using  HOS-based 
deconvolution  techniques  as  is  shown  in  this  paper. 

Equation  (1)  can  be  written  in  the  poly  spectrum 
domain  as  [4] 

H{W2  )•  •  •  )H\W  +  W2  +... 

(2) 

where  the  nth-order  spectrum  of 

the  signal  s(t)  (which  could  be  y(t),  x(t),  or  N(t)),  H(w)  is 
the  Fourier  transform  of  the  defect  impulse  response  hft), 
and  w  is  the  angular  frequency.  Without  loss  of 
generality,  (2)  can  be  rewritten  as 

...,H'„_i)  +  C/(Wi,W2,...W„_i) 

(3) 

For  Gaussian  noise,  the  polyspectrum  (n>2),  of  N(t)  is 
zero  and  thus,  the  noise-free  polyspectrum  of  the  defect 
impulse  response  can  be  calculated  from  (3),  and  used  to 
recover  hft).  Alternatively,  if  the  bispectrum  is  used ,  i.e., 
n-3  above,  then  the  noise  does  not  have  to  be  Gaussian 
to  be  filtered  out  from  (2)  and  (3).  It  can  have  any 
symmetric  probability  density  function  (PDF).  With  one 
of  these  noise  assumptions  in  mind,  equation  (2)  and  (3) 
represent  the  basis  for  the  HOS-based  deconvolution 
technique  used  in  this  paper. 

3.  Results 

In  this  section,  the  HOS-based  deconvolution 
technique  is  tested  on  synthesized  as  well  as  real 
ultrasonic  signals  obtained  from  artificial  defects  [2].  For 
computational  efficiency,  bispectra  of  the  input-output 
signals  are  used  only.  In  addition,  as  the  recovery  of  a 
signal  from  its  bispectrum  is  not  a  one-to-one 
transformation,  we  calculate  the  bicepstrum  using  the 
relationship  between  biceptrum  and  bispectrum  defined 
by  Pan  and  Nikias  [5],  thereafter,  the  defect  impulse 
response  is  recovered  using  the  bicepstral  parameters  [6]. 


mift) 


Fig.  2:  The  ultrasonic  defect  model 
3.1  Synthesized  Data 
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With  reference  to  Fig.  2,  the  input  signal,  x(t),  is  taken 
as  a  Gaussian  pulse  that  is  amplitude  modulating  a  single 
tone  carrier  whose  jfrequency  lies  in  the  ultrasonic  range. 
The  noise,  N(t),  having  a  normally  distributed  PDF,  is 
scaled  by  a  constant,  a  ,  to  account  for  different  signal  - 
to-noise  ratios  (SNR).  Three  different  linear  time- 
invariant  systems  are  considered  in  this  paper,  namely: 

•  A  non  minimum  phase  moving  average  (MA)  system 
whose  transfer  function  is  given  by 

=  0.2197z^  -  0.747Z  +  0.6085  +  0.1533z"*  (4) 

•  A  minimum  phase  autoregressive  (AR)  system 
whose  transfer  function  is  given  by 


1 


_-3 


(5) 


l-0.7z”*+0.6z“^-0.3z‘ 

•  A  non  minimum  phase  autoregressive  moving 
average  (ARMA)  system  whose  transfer  function  is 
given  by 


1  -  325z-‘  +  3.5399z-^  -  12487z~^ 
i_i86^-i+i.47^-2_o.5246z-' 

For  a  given  SNR,  the  output  signal,  y(t),  is  computed 
using  the  model  of  Fig.  2.  The  bispectrum  (n=3)  of  the 
system  impulse  response  (SIR)  is  obtained  from  (3),  and 
used  to  recover  h(t)  using  the  bicepstral  parameters  as 
stated  above.  To  test  the  performance  of  the  proposed 
technique,  the  variance  of  the  error  signal  (between  the 
true  and  estimated  SIR  signals  of  the  MA  system  above) 
is  computed  for  each  SNR.  Fig.  3  shows  this  result  for  a 
SNR  as  low  as  -5  dB.  For  comparison,  a  similar  error 
variance  is  computed  when  the  MA  SIR  is  estimated 
using  Wiener  filter  and  is  shown  in  the  same  figure.  It  can 
be  clearly  seen  that  the  HOS-based  deconvolution 
technique  excels  its  counterpart  CDT  represented  here  by 
Wiener  filter. 


Fig. 3:  Error  variance  of  the  MA  system  impulse 
response. 

To  quantify  the  effect  of  these  error  variances  on  the 
estimated  MA  SIR,  a  plot  of  the  later  at  a  SNR=5dB  is 


shown  in  Fig.  4.  It  can  be  seen  that  while  the  estimated 
impulse  response  obtained  from  the  HOS  technique  is 
faithfully  reproduced,  the  Wiener  filter,  at  an  error 
variance  of  about  2.2,  fails  completely. 


Sample  Number 


Fig.  4:  MA  system  impulse  responses  obtained  from 
HOS  ( top),  and  Wiener  filter  (  bottom)  deconvolution 
techniques  for  5  dB  SNR. 

To  complete  this  section,  the  AR  and  ARMA  systems 
as  defined  by  (5)  and  (6),  are  tested  and  their 
corresponding  SIR  estimated  using  the  proposed 
technique  for  a  SNR=  5  dB,  are  shown  in  Figs.  5  and  6 
respectively. 


Fig.  5:  AR  system  impulse  response  :  true  and 
estimated  SIR  for  a  SNR  =  5  dB. 

Again,  the  HOS-based  deconvolution  technique,  with 
its  potential  of  preserving  the  phase  information, 
faithfully  reproduces  the  SIR  of  both  minimum  (  Fig.  5), 
and  non  minimum  (Fig.  6)  phase  systems  even  at 
extremely  low  SNR.  The  small  variations  shown  in  Figs. 
5  and  6  may  be  attributed  to  errors  made  in  computing  the 
cepstral  parameters  from  the  bicepstrum  [5]. 
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Sample  Number 

Fig.6  :  ARMA  system  impulse  response  :  true  and 
estimated  SIR  for  a  SNR  =  5  dB. 


3.2  Real  Ultrasonic  Data 

The  proposed  deconvolution  technique  is  tested  using 
real  ultrasonic  data  [2],  which  is  part  of  a  larger  data  set 
obtained  from  the  Army’s  Material  Technology 
Laboratory,  (Watertown,  MA  ).  The  input  signal,  x(t),  is 
measured  in  practice,  from  a  flawless  sample  (AO),  having 
the  same  characteristics  as  the  specimen  under  test.  It  is 
referred  to;  sometimes;  as  the  reference  signal.  Two 
artificial  defects  are  considered;  namely  a  flat-cut  (  Al), 
and  an  angular-cut  hole  (A2),  in  aluminum  blocks,  (see 
[2]  for  an  illustration  of  these  defect  geometries).  The 
center  frequency  of  the  transducer  used  is  15  MHz,  and 
the  A-scan  signals  contain  512  data  points  digitized  at  a 
rate  of  100  MHz.  The  pulse-echo  signals  corresponding  to 
AO,  Al,  and  A2  samples  are  represented  by  T15A0, 
T15A1,  and  T15A2  respectively.  For  clarity,  the  signal 
T15A1  is  shown  in  Fig.  7. 


Fig,  7:Ultrasonic  Pulse  echo  measured  from  sample  Al . 

When  the  bispectrum-based  deconvolution  technique  is 
applied  to  real  ultrasonic  signals,  namely;  T15A1  and 


T15A2,  with  T15A0  taken  as  the  reference  signal, 
smooth,  oscillation-free  impulse  responses  are  obtained  as 
shown  in  Figs.  8  and  9.  For  comparison  with  CDT,  the 
reader  is  referred  to  [7]  where  the  same  signals  have  been 
used. 


Fig.  8:  Impulse  response  of  the  flat-cut  hole  (Al). 


Fig.  9:  Impulse  response  of  the  angular-cut  hole  (A2). 

4.  Conclusion 

In  this  paper,  we  have  shown  that  the  drawbacks  of  the 
SOS-based  CDT  are  completely  removed  when  HOS- 
based  deconvolution  techniques  are  used.  Synthesized,  as 
well  as  real  ultrasonic  signals  have  been  used  to 
demonstrate  this  claim.  Although  we  have  focused  on  a 
non  parametric  deconvolution  technique,  and  the 
bispectrum  case  of  the  polyspectra,  higher-order  based, 
parametric  blind  deconvolution  techniques  can  also  be 
used  to  remove  the  effect  of  the  measured  reference 
signal.  Future  work  will  be  directed  towards  the  influence 
of  different  reference  signal  models  on  the  deconvolved 
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defect  impulse  response  using  both  polyspectra  and 
polycepstra  of  real  ultrasonic  signals. 
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Abstract 

Mobile  communication  links  require  adaptive  equaliza¬ 
tion  with  a  fast  rate  of  convergence  while  keeping  computa¬ 
tional  effort  at  reasonable  levels.  In  this  paper  we  propose 
to  combine  known  algorithms  for  blind  equalization  in  or¬ 
der  to  exploit  their  desirable  properties  to  reach  this  goal.  A 
switching  criterion  is  proposed  which  is  based  on  the  change 
in  the  equalizer  impulse  response  between  iterations  of  the 
adaption  algorithm  and  may  be  used  to  detect  changes  of  the 
channel  impulse  response.  Algorithms  under  consideration 
include  Godard's  algorithm,  Stop-and-Go  algorithm,  and 
tricepstrum  equalization  algorithm  (TEA). 


1.  Introduction 

Blind  higher  order  statistics  (HOS)  based  equalization 
algorithms  are  in  use  in  a  wide  area  of  signal  processing. 
Some  of  these  applications,  like  biomedical  and  seismic, 
permit  offline  processing  of  data  while  others,  like  reduction 
of  intersymbol  interference  (ISI)  in  mobile  communications, 
require  continuing  processing  of  received  data. 

ISI  introduced  by  channel  fading  is  a  major  limitation  for 
mobile  link  performance.  Due  to  movements  of  users  and 
shading  and  reflecting  obstacles  in  such  environments,  chan¬ 
nel  characteristics  are  changing  continually.  These  rapid 
changes  require  adaptive  equalization  algorithms  with  fast 
convergence.  Such  algorithms  are  often  computationally  ex¬ 
pensive  while  others  that  are  less  computationally  expensive 
have  a  slower  rate  of  convergence. 

In  this  paper,  we  propose  to  use  fast  but  complex  algo¬ 
rithms  only  for  a  startup  phase  and  switch  to  slower  but  more 
simple  algorithms  for  tracking  after  a  fixed  number  of  iter¬ 
ations.  A  metric  is  presented  to  detect  significant  changes 
of  the  channel  impulse  response,  allowing  to  switch  back  to 
the  faster  algorithm. 


The  organisation  of  the  paper  is  as  follows.  In  section  2 
the  existing  algorithms  used  in  this  paper  are  described  and 
compared.  In  section  3  the  concept  of  combining  algorithms 
and  the  switching  criterion  are  explained.  Section  4  gives 
the  simulation  results  for  the  proposed  algorithms. 

2.  Existing  algorithms 

HOS  based  algorithms  for  blind  equalization  are  either 
Bussgang  type  which  use  HOS  implicitly  or  are  based  ex¬ 
plicitly  on  HOS.  Bussgang  type  algorithms  based  on  LMS 
adaption  use  a  substitute  for  the  desired  data,  obtained  by  a 
memoryless  nonlinear  transform  of  the  received  data.  Vari¬ 
ous  algorithms  using  different  nonlinearities  have  been  pro¬ 
posed  in  the  literature  [1],  [2],  [3],  [4],  [5]. 

An  algorithm  based  explicitly  on  HOS  is  the  tricepstrum 
equalization  algorithm  (TEA)  [6].  This  algorithm  identifies 
the  (possibly  nonminimum  phase)  impulse  response  of  the 
channel  and  uses  its  inverse  for  the  equalizer. 

2.1.  Bussgang  type  algorithms 


The  basic  principle  of  a  Bussgang  type  equalizer  is  shown 
in  Fig.  1.  Two  Bussgang  type  algorithms  used  in  this  paper 
are  Godard  algorithm  [2]  and  Stop-and-Go  algorithm  [4]. 


Figure  1.  Bussgang  algorithm  equalizer 

Godard’s  algorithm  uses  a  nonlinearity  of  the  form 
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with 


iz(fc)|2^->] 

(1) 


the  tricepstrum  and  solving  this  system,  and  3)  calculating 
the  equalizer  impulse  response  from  the  tricepstrum  coeffi¬ 
cients.  Detailed  descriptions  of  the  TEA  algorithm  can  be 
found  in  [6],  [7].  The  structure  of  an  equalizer  using  TEA 
is  shown  in  Fig.  2. 


,  _£[K^ 

~  E[\xik)\P] 


(2) 


where  z(Ar)  is  the  signal  at  the  equalizer  output  andx{k) 
are  the  transmitted  symbols.  In  our  simulations  we  chose 
p  =  2.  The  transmitted  symbols  are  assumed  to  be  indepen¬ 
dent  and  identically  distributed  (i.i.d.)  with  non-Gaussian 
distribution.  The  perturbations  introduced  by  the  channel 
are  ISI  and  additive  white  Gaussian  noise. 

As  the  cost  function  of  the  Godard  algorithm  is  not  con¬ 
vex,  the  convergence  to  the  global  minimum  of  ISI  is  not 
guaranteed.  However,  proper  choice  of  step  size  parameter 
and  initial  equalizer  gain  can  ensure  convergence  [2]. 

The  Stop-and-Go  algorithm  [4]  is  a  modification  of  the 
decision  directed  (DD)  algorithm  [1].  The  DD  algorithm 
is  known  to  require  a  good  initial  estimate  to  be  able  to 
track  the  signal  (open  eye  condition).  The  Stop-and-Go 
algorithm  extends  the  DD  algorithm  to  have  the  capability 
of  blind  equalization  by  stopping  equalization  for  the  current 
iteration  whenever  the  probability  that  the  decided  sign  is 
the  correct  one  is  too  small.  The  corresponding  memoryless 
nonlinearity  has  the  form 


Figure  2.  TEA  equalizer 


The  advantages  of  this  technique  over  Bussgang  algo¬ 
rithms  are  the  speed  of  convergence,  the  correct  estimation 
of  the  channel  phase,  and  the  absence  of  convergence  prob¬ 
lems.  However,  the  most  severe  problem  is  the  significant 
computational  cost  in  the  estimation  of  the  cumulants  and 
solving  the  equation  system.  Another  drawback  is  that  the 
steady-state  level  of  intersymbol  interference  is  significantly 
higher  than  for  Bussgang  algorithms.  This  condition  can  be 
improved  by  proper  choice  of  the  number  of  cumulants  used 
for  the  algorithm. 


g[z{k)]  =  z{k)  -  i  {mRe{k)R  -  f{k)ie{k)i)  (3) 

where  i{k)R  =  Real[z{k)  —  £(!:)]  and  e{k)i  = 
Imag[z{k)  -  x(I;)]  are  the  real  and  imaginary  parts  of 
the  difference  between  the  received  symbol  and  the  de¬ 
cided  symbol  x{k).  The  flags  f{k)R,  f{k)i  €  {0, 1}  are 
chosen  to  be  one  if  and  only  if  the  signs  of  the  real  and 
imaginary  part,  respectively,  of  i{k)  and  the  Sato-like  er¬ 
ror  e(jfc)  =  z{k)  -  P  •  csgn[z(fc)]  agree.  The  constant  /?  is 
real.  The  complex  sign  function  is  denoted  by  csgn[-].  The 
Stop-and-Go  algorithm  typically  converges  in  two  phases:  a 
startup  phase  with  slow  convergence  and  a  rapidly  converg¬ 
ing  second  stage. 

Common  to  both  Bussgang  type  algorithms  is  that  the 
phase  of  the  input  symbols  cannot  be  detected  correctly. 

2.2.  Explicit  HOS  based  algorithms 

An  explixitly  HOS  based  algorithm  is  the  tricepstrum 
equalization  algorithm  (TEA)  [6].  This  algorithm  proceeds 
in  3  steps:  1)  (adaptive)  estimation  of  a  number  of  second 
order  moments  and  fourth  order  cumulants,  2)  forming  an 
overdetermined  set  of  linear  equations  for  the  coefficients  of 


3.  Algorithm  combinations 
3.1.  Motivation 

Mobile  communication  channels  undergo  rapid  changes. 
At  high  data  rates  the  channel  is  assumed  to  be  quasi  static 
over  a  certain  number  of  symbols.  Data  is  then  transmitted 
in  packets  with  a  length  according  to  the  variation  rate  of 
the  channel.  At  the  beginning  of  transmission  of  each  data 
packet  a  training  sequence  is  sent  to  allow  the  estimation 
of  the  equalizer  coefficients.  During  data  transmission,  the 
equalizer  impulse  response  is  kept  constant  or  a  simple  al¬ 
gorithm  (like  DD)  is  used  for  tracking.  When  the  channel 
undergoes  a  significant  change  tracking  will  be  lost. 

An  improvement  will  be  to  use  blind  equalization  algo¬ 
rithms  which  do  not  require  a  training  sequence.  Exami¬ 
nation  of  existing  algorithms  shows  that  there  is  a  trade-off 
between  rate  of  convergence  and  computational  complexity. 
It  is  possible  to  use  a  complex  but  fast  converging  algorithm 
for  an  initial  startup  and  switch  later  to  a  simple  algorithm 
for  tracking.  This  scheme  can  be  further  improved  by  us¬ 
ing  a  switching  criterion  on  the  available  information  at  the 
receiver  to  detect  significant  channel  changes. 
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3.2.  Single  channel 

Simulations  have  been  performed  using  a  single  static 
channel.  As  a  first  stage  equalization  has  been  carried  out 
using  the  TEA  algorithm.  After  a  fixed  number  of  iterations 
the  algorithm  has  been  changed  to  a  Bussgang  type.  It  turned 
out  that  as  little  as  100  samples  for  the  TEA  are  enough  to 
make  equalization  twice  as  fast  as  using  Bussgang  type  al¬ 
gorithms  alone  (see  Fig.  3,  4).  The  performance  criterion 
intersymbol  interference  (ISI)  is  explained  in  section  4.  Us¬ 
ing  more  samples  for  TEA  equalization  does  not  lead  to 
faster  convergence.  After  switching  from  TEA  to  Bussgang 
an  increase  in  ISI  can  be  observed  before  it  converges  to  its 
final  value.  An  important  feature  of  the  combined  algorithm 
is  that  the  initial  phase  estimate  of  the  TEA  algorithm  is 
kept  by  the  following  Bussgang  equalization.  Bussgang  al¬ 
gorithms  themselves  are  unable  to  estimate  the  correct  phase 
of  the  data. 

3.3.  Multiple  channels 

When  the  channel  changes  significantly  it  is  desirable  to 
detect  these  changes  and  switch  back  to  the  faster  algorithm. 
A  decision  criterion  has  to  be  found  which  is  only  based  on 
information  available  at  the  receiver,  i.e.  on  the  detected 
data  or  on  the  equalizer  impulse  response.  The  criterion 
proposed  in  the  following  is  derived  from  the  equalizer  im¬ 
pulse  response.  The  first  step  is  to  find  a  measure  of  the  state 
of  equalization.  The  change  of  equalizer  impulse  response 


q{k)  =  10  •  logio  I  ^  |won(fe)  -  WOnik  -  1)1' 

\  n 


with  the  won  are  the  tap  weights  of  the  normalized  equal¬ 
izer  impulse  response  won  =  1),  seems  to  be  a  nat¬ 
ural  choice.  W^en  the  equalizer  is  not  initialized  the  first 
changes  in  the  equalizer  impulse  response  will  be  large  until 
the  tracking  mode  is  reached.  Then  smaller  changes  re¬ 
sult  from  noise  and  random  input  data.  When  the  channel 
changes  considerably,  the  equalizer  impulse  response  will 
show  larger  variations  again.  To  recognize  a  trend  averag¬ 
ing  is  performed  on  this  criterion  in  a  way  that  the  mean 
value  of  recent  iterations  is  compared  to  the  mean  value  and 
the  standard  deviation  of  a  much  larger  number  of  previous 
samples: 


— iVi+l 
jb-Ni 


miik)  2  ?(n)  (6) 


n-k-Ni-N2+l 


^  {q(n)-m2ik))^  (7) 

\  ^  n=k-Ni-N2-\-l 

Switching  between  the  algorithms  occurs,  when 

m\{k)  >  m2{k)  -h  a  *  (T2{k)  (8) 

where  a  is  a  real  constant.  A  channel  change  is  assumed 
as  soon  as  the  number  of  samples  from  q{k)  rises  above  a 
certain  level  over  the  mean  of  the  past  samples.  The  fraction 
of  the  standard  deviation  is  used  to  avoid  changes  caused  by 
noise  and  to  provide  an  adaptive  measure  for  switching.  The 
choice  of  N\,  N2,  and  a  is  essential  for  a  reliable  detection 
of  channel  changes. 

4.  Simulation  results 

A  number  of  simulations  have  been  carried  out  to  demon¬ 
strate  viability  of  the  above  concepts.  Three  different  chan¬ 
nel  impulse  responses  with  four  taps  each  have  been  consid¬ 
ered. 


channel  1 

channel2 

channel3 

-0.0810+0.06121 

0.9249-0.90031 

0.2825+0.36501 

-0.0779-0.08871 

-0.1411-0.00401 

0.9496-0.91941 

0.2764+0.37041 

0.0779-0.8871 

0.1820  +  0.00091 
-0.4942+0.44701 
0.6461-0.7941 

0.2 

Channels  1  and  2  have  similar  rms  delay  spread  (0.3633 
and  0.3643),  while  channel  3  has  a  rms  delay  spread  of 
0.5654. 

For  faster  operation  the  tricepstrum  equalization  algo¬ 
rithm  has  been  implemented  such  that  the  equation  system 
is  updated  only  once  every  100  iterations  while  the  cumu- 
lants  are  updated  at  each  iteration.  This  means  that  for  the 
combined  algorithm  the  computationally  expensive  part  of 
the  TEA  has  to  be  performed  only  once. 

Another  crucial  point  of  implementation  is  the  proper 
choice  of  the  parameters  iVi,  N2,  and  a  of  the  switching 
criterion.  The  following  dependencies  have  been  observed: 
A  value  of  Ai  =  100  is  sufficient.  N2  should  be  at  least 
three  times  larger  than  Ni .  The  larger  N2  the  more  reliable 
is  the  criterion.  The  parameter  a  has  an  optimal  value.  Is 
a  too  small,  normal  fluctuations  of  q{k)  without  changes 
of  the  channel  cause  false  alarms.  If  a  is  too  large,  no 
channel  changes  will  be  detected.  The  parameters  depend 
on  the  algorithm  used.  The  ISI  of  the  cascaded  channel  and 
equalizer  impulse  response  s  with  the  largest  absolute  tap 
value  \$\max  is  [8] 

ISI{s)  =  10  •  logio  ( [dB]  (9) 

\  l^lmaar  / 
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number  of  iterations 


Figure  3.  ISI  performance  for  channel  1  and 
5  Monte  Carlo  simulations 


Figure  3  shows  the  ISI  performance  of  the  basic  algo¬ 
rithms  for  channel  1.  For  each  algorithm  5  Monte  Carlo 
simulations  have  been  performed.  It  can  be  seen  that  the 
Godard  algorithm  converges  faster  than  Stop-and-Go,  but 
settles  at  a  higher  level  of  ISI.  The  performance  of  LMS 
adaptive  algorithms  strongly  depends  on  the  step  size  param¬ 
eter  chosen.  The  implemented  TEA  algorithm  converges 
much  faster  than  the  Bussgang  type  algorithms  but  settles  at 
an  even  higher  level  of  ISI. 

Figure  4  shows  how  the  speed  of  convergence  is  improved 
by  combining  algorithms.  Again,  the  channel  used  for  the 
simulations  is  channel  1 .  After  a  fixed  number  of  iterations 
the  TEA  algorithm  is  switched  off  and  Bussgang  algorithms 
are  used.  It  can  be  seen  that  the  number  of  TEA  iterations 
performed  does  not  have  a  significant  influence  on  the  speed 
of  convergence  of  the  following  algorithm. 

A  disturbing  fact  is  that  after  switching  the  ISI  goes  up 
instead  of  down.  It  is  desirable  to  reduce  this  effect  and  to 
obtain  a  continuous  decrease  of  ISI. 

Figures  5  and  6  demonstrate  the  effect  of  changes  in  the 
channel  on  ISI  and  the  error  criterion.  To  allow  recognizing 
a  trend  in  the  error  plots  averaging  over  100  samples  has 
been  performed.  The  first  channel  simulated  in  Fig.  5  and 
Fig.  6  is  channel  1.  After  8000  samples  the  channel  has  been 
switched  to  channel  2  or  3,  respectively. 

5.  Conclusions 

It  is  indeed  possible  to  speed  up  the  convergence  of  Buss¬ 
gang  algorithms  by  combining  them  with  an  initial  stage  of 
TEA  initialization.  Since  TEA  algorithm  uses  upto  fourth 
order  statistics  and  the  initial  stage  takes  only  100  samples  it 
may  be  expected  that  the  contribution  of  the  TEAalgorithm 


to  equalization  is  rather  small.  More  important  is  that  a  cor¬ 
rect  estimate  of  the  channel’s  phase  relations  is  given  which 
is  retained  by  the  Bussgang  algorithm.  This  is  of  particular 
interest  since  Bussgang  algorithms  are  not  able  to  estimate 
the  correct  phase  relations. 

The  second  point  of  this  paper  is  the  derivation  of  a 
reliable  criterion  to  detect  channel  changes.  It  turns  out  that 
the  change  of  equalizer  impulse  response  is  able  to  give  this 
measure.  The  parameters  of  the  criterion  have  been  found 
experimentally  and  depend  on  the  algorithm  used.  Higher 
reliability  is  obtained  using  the  Stop-and-Go  algorithm. 

Further  research  in  this  subject  has  to  be  done.  This 
includes  consideration  of  more  realistic  channels.  A  num¬ 
ber  of  statistical  models  for  both  indoor  and  outdoor  mobile 
communication  exists  [9].  The  increase  of  ISI  after  switch¬ 
ing  from  TEA  to  Bussgang  algorithms  has  to  be  investigated 
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TEA  and  Godard  algorithm,  channels  1~>2  TEA  and  Stop-and-Go  algorithm,  channels  1->2 


number  of  iterations  number  of  iterations 


Figure  5.  Multiple  channels,  TEA  and  Godard  Figure  6.  Multiple  channels,  TEA  and  Stop- 

algorithm  and-Go  algorithm 


to  improve  the  rate  of  convergence  even  further. 
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Abstract 

System  reconstruction  from  arbitrarily  selected  slices 
of  the  n-th  order  output  spectrum  is  considered.  We 
establish  that  unique  identification  of  the  impulse  re¬ 
sponse  of  a  system  can  be  performed,  up  to  a  scalar 
and  a  circular  shift,  based  on  any  two  horizontal  slices 
of  the  discretized  n-th  order  output  spectrum,  (n  >  ^), 
as  long  as  the  distance  between  the  slices  and  the  grid 
size  satisfy  a  simple  condition.  For  the  special  case  of 
real  systems,  one  slice  suffices  for  reconstruction.  The 
ability  to  select  the  slices  to  be  used  for  reconstruction 
enables  one  to  avoid  regions  of  the  n-th  order  spectrum 
where  the  estimation  variance  is  high,  or  where  the 
ideal  bispectrum  is  expected  to  be  zero,  as  in  the  case  of 
bandlimited  systems.  We  propose  a  mechanism  for  se¬ 
lecting  slices  that  result  in  improved  system  estimates. 
We  also  demonstrate  via  simulations  the  superiority, 
in  terms  of  estimation  bias  and  variance,  of  the  pro¬ 
posed  method  over  existing  approaches  in  the  case  of 
bandlimited  systems. 


1.  Introduction 

System  reconstruction  has  been  a  very  active  field 
of  research  in  the  recent  years.  Higher-order  spectra 
(HOS)  have  been  applied  successfully  to  the  problem 
of  system  reconstruction,  mainly  because  of  their  abil¬ 
ity  to  preserve  the  true  system  phase,  and  their  ro¬ 
bustness  to  additive  Gaussian  noise  of  unknown  co- 
variance.  System  reconstruction  methods  can  be  di¬ 
vided  into  two  main  categories:  parametric  and  non- 
parametric.  Parametric  methods  fit  a  specific  model 
to  the  output  observations  and  use  the  output  statis¬ 
tics  to  identify  the  model  parameters.  Sensitivity 
to  model  order  mismatch  is  their  main  disadvantage. 

*This  work  was  supported  by  NSF  under  grant  MIP-9553227. 


Non-parametric  methods  reconstruct  the  system  by  re¬ 
covering  its  Fourier  phase  and  magnitude.  In  this  pa¬ 
per  we  consider  non-parametric  system  reconstruction 
methods. 

Non-parametric  methods  can  be  divided  into  two 
main  sub-categories,  those  that  utilize  the  whole  bis¬ 
pectrum  information  ([4],  [1],  [6],  [8],  [5]),  and  those 
that  require  bispectrum  slices  ([3],  [2]).  Methods  that 
use  fixed  bispectrum  slices  cannot  be  applied  for  the  re¬ 
construction  of  bandlimited  systems,  since,  depending 
on  the  system,  the  ideal  bispectrum  along  these  slices 
can  be  zero.  Moreover,  in  the  presence  of  noise  and 
finite  data  records,  bispectrum  estimates  along  fixed 
slices  can  exhibit  high  estimation  variance,  and  since 
single  slices  are  used,  there  is  no  averaging  mechanism 
to  reduce  estimation  errors. 

Several  bispectrum  slices  within  the  bispectrum 
principal  domain  have  been  used  in  a  method  proposed 
in  [1],  and  averaging  was  performed  over  the  frequency 
response  sample  estimates.  However,  as  the  sample 
number  decreases,  the  number  of  independent  realiza¬ 
tions  of  the  corresponding  frequency  response  sample 
over  which  averaging  can  be  performed  decreases  too. 
As  a  result,  low  frequency  samples  of  the  frequency  re¬ 
sponse  exhibit  considerable  estimation  variance  in  the 
presence  of  noise  or  data  of  finite  lengths.  Errors  in 
low  frequency  samples  can  propagate  to  the  remaining 
samples,  since  the  method  is  iterative  in  nature.  How¬ 
ever,  the  method  performs  well  in  the  case  of  wideband 
systems.  The  methods  in  [4]  and  [8]  can  also  be  viewed 
as  methods  that  use  several  bispectrum  slices.  Accord¬ 
ing  to  these  methods,  the  non-redundant  bispectrum  is 
used  to  form  a  linear  system  of  equations,  which  can 
be  solved  for  the  unknown  frequency  response  samples. 
Although  the  system  of  equations  is  overdetermined 
and  a  solution  could  be  obtained  even  if  some  slices 
were  discarded,  there  is  no  mechanism  for  selecting  the 
“best”  slices  to  use,  nor  is  there  any  guarantee  that 
certain  bispectrum  regions  can  be  avoided. 
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In  this  paper,  we  consider  the  possibility  of  system 
reconstruction  from  arbitrarily  selected  higher-order 
spectra  slices  of  the  system  output.  We  first  estab¬ 
lish  that  unique  identification  of  the  impulse  response 
of  an  arbitrary  system  can  be  performed,  up  to  a  scalar 
and  a  circular  shift,  based  on  any  two  horizontal  slices 
of  the  output  discretized  n-th  order  spectrum,  of  any 
order  n  >  3,  of  the  system,  as  long  as  the  distance 
between  the  slices  and  the  grid  size  satisfy  a  simple 
condition.  When  the  system  is  real,  one  slice  suffices 
for  system  reconstruction.  We  then  propose  a  method 
for  system  reconstruction  based  on  a  pair  of  selected 
HOS  slices  of  the  system  output.  In  [7]  it  was  shown 
that  the  obtained  system  estimates  are  asymptotically 
unbiased  and  consistent.  We  propose  a  mechanism  to 
select  the  slices  that  will  result  in  improved  system  es¬ 
timates.  We  demonstrate  via  simulation  examples  the 
superiority  of  the  proposed  method  over  existing  ap¬ 
proaches,  for  the  case  of  bandlimited  systems. 


The  frequency-domain  bispectrum  of  x(n)  is  given 
by 

Cf(a;i,a;2)  -  ^^2),  (2) 

with  H{uj)  denoting  the  frequency  response  of  the  sys¬ 
tem.  The  following  proposition  holds: 

Proposition  1  For  the  process  x{n)  described  by 
eq.  (l)f  h(n)  is  always  identifiable^  within  a  (complex) 
constant  and  a  circular  shift,  from  any  two  slices  of  the 
discretized  output  bispectrum,  i.e,  C'f  (^ib,  ^/i)  and 
—  0, . . . ,  TV  —  1,  if  and  only  if  N  and 
r  =  |/i  — /2I  relatively  prime  integers.  Ifh{n)  is  real, 

then  it  is  identifiable,  within  a  constant  and  a  circular 
shift,  based  on  a  single  slice  of  the  discretized  output 
bispectrum,  i.e.,  C§{^k,^l),  if  and  only  if  N  and 
r  =  2/  are  coprime  integers. 

The  proof  of  this  proposition  can  be  found  in  [7]. 


2.  System  reconstruction  from  any  pair 
of  horizontal  slices  of  the  HOS  of  the 
system  output 

We  define  a  horizontal  slice  of  the  n-th  order  spec¬ 
trum  C'®(u;i,a;2,  •  ‘  of  a  signal  x{n)  as  the  one¬ 

dimensional  sequence  that  arises  when  we  fix  all  in¬ 
dices  a;2,...,w„  to  certain  real  numbers,  and  allow 
wi  to  take  all  possible  values  in  (— 7r,7r].  Through¬ 
out  the  paper,  the  term  “slice”  will  be  used  instead 
of  “horizontal  slice” .  The  distance  between  two  slices 
Q(w,ai, . .  .,an-i)  and  .  .,/?n-i)  is  defined 

as  the  /^-norm  of  the  vector  ol  i.e.,  ||a  —  P\\^  = 
YJiZi  I  -  A'  I,  where  ol  =  [ai, . . . ,  ocn-iY  and  = 

In  this  paper  we  consider  reconstruction  from  third- 
order  spectra.  A  generalization  of  the  results  to  the 
n-th  order  spectra  case  can  be  found  in  [7].  Consider 
a  stationary  process  x{n)  given  by: 

x{n)  —  e{n)  *  h{n)  -f  w{n),  (1) 

where  e{n)  is  an  i.i.d.  non-Gaussian  process  with 
zero  mean  and  finite  n-th  order  cumulant  7®  ^  0,  for 
n  >  2;  u;(n)  is  a  stationary  zero-mean  Gaussian  process 
of  unknown  covariance  which  is  assumed  independent 
of  e(n);  h{n)  is  the  impulse  response  of  an  exponen¬ 
tially  stable,  generally  mixed-phase,  complex  LTI  sys¬ 
tem  which  has  to  be  estimated  from  the  output  x[n). 
It  is  initially  assumed  that  h{n)  does  not  have  zeros 
on  the  unit  circle,  however  this  assumption  is  relaxed 
later. 


3.  The  reconstruction  method 


By  evaluating  (2)  at  discrete  frequencies  a;  = 
fc  G  [0, . . . ,  iV  —  1],  we  obtain  the  discrete  bispectrum 
of  x(n),  i.e., 

CUk,  1)  =  j^sH{k)H{l)H{-k  -  /).  (3) 

Let  us  consider  two  slices  of  the  discrete  bispectrum 
at  distance  r,  i.e.  slices  (:  ./)  and  (:,/  -h  r),  with  / 
arbitrarily  chosen,  i.e., 

C§(k,l)  =  jlHik)H{l)Hi-k  -  1) 

Ci{k,l  +  r)  =  y^H(k)H{l  +  r)H{-k -I -r){4) 

Taking  natural  logarithms  of  both  sides  in  (4)  and  sub¬ 
tracting  we  get: 


logC'f(fc,/)-logC'f(fc,/-Kr)  = 

■  ,  Hjl)  H{-k 

log  JJ/,  ,  X  +  log 


0 


^(/  +  r) 


H{-k-l-r) 


(5) 


Substituting  m  =  —k  —  /  in  (5)  we  get: 

log  H{m)  =  log  H{m  —  r)  -|-  log  H{1  +  r)  —  log  H{1) 

+  log  C|(-m  -1,1)-  logCK-m  -1,1 4-  r),  (6) 

where  it  can  be  shown  that: 


\ogH{l  +  r)-\ogH{l)  = 

.  N-l 

-  Y,  [log  Cl{k,  l  +  T)-  log  Cl(k,  /)] .  (7) 

A;=0 
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From  (6)  and  (7)  we  would  be  able  to  calculate  the 
frequency  response  of  the  system  recursively,  provided 
that  the  initial  conditions  {\ogH{k),  fc  =  0, 1, . . . ,  r — 
1}  were  known  a  priori.  However,  a  solution  can  still  be 
obtained,  without  the  need  of  any  a  priori  information. 

Let  h,  =  [log  if  (1), .  ..,logHiN  -  1)]^  be  the  {N  - 
1)  X  1  vector  of  the  unknown  samples  of  the  logarithm 
of  the  frequency  response  of  the  system.  By  substi¬ 
tuting  s  =  m  -  r  in  (6),  and  letting  s  take  the  values 
0,1, ...,7^-2,  we  form  the  linear  system  of  equations: 

Ah/  =  c,  (8) 


where  c  is  a  (TV  -  1)  X  1  vector  of  bispectrum  values 
along  the  slices  (:,  /)  and  (:,  /  +  r),  with 

Ci  -  logC'f(-i-r-/,/)-logC^(-T-r-/,/  +  r)-hc/,r 

(9) 

where  z  =  0, 1, . . TV  —  2,  and 

c/,r  =  logi7(/  +  r)  -  logT7(/).  (10) 

Matrix  A  is  a  sparse  matrix  with  special  structure;  it 
is  bidiagonal  if  r  =  1  and  tridiagonal  otherwise,  and 
contains  Vs  and  (— l)’s.  It  can  be  proved,  [7],  that 
matrix  A  25  nonsingular  if  and  only  if  TV  and  r  are 
relatively  prime  integers  and  that  if  A  25  nonsingular, 
then  det  A  =  1. 

In  this  method,  the  logarithm  of  a  pair  of  bispec¬ 
trum  slices  is  used  to  recover  the  logarithm  of  the  fre¬ 
quency  response  of  a  system.  Although  the  phase  of 
the  bispectrum  appears  implicitly  in  the  expressions, 
only  the  principal  argument  of  the  bispectrum  is  actu¬ 
ally  needed.  To  see  that,  let  1)  denote  the  principal 
argument  of  the  phase  sample  tp{k,  /).  Then  it  holds: 

i>ik,l)  =  ^{k,l)-\-2irI{kJ),  (11) 


where  I{k,  1)  is  an  integer  function  of  fe,  /.  The  solution 
of  (8),  when  c  is  computed  based  on  i>{k,l),  becomes: 


h,  =  A-'  (c  +  i2,rli(0+i^l2(0) 


where  Ii(/),l2(/)  are  vectors  of  integer  values.  How¬ 
ever,  since  det(A)  =  1,  the  reconstructed  frequency  re¬ 
sponse  of  the  system  will  only  differ  from  the  true  one 
by  a  complex  exponential  factor  of  the  form  , 

where  /(/)  is  an  integer  function  of  /,  which  corresponds 
to  a  circular  shift  of  the  impulse  response  of  the  system. 
Therefore,  h(n)  is  reconstructed  within  a  multiplicative 
scalar  and  a  circular  shift,  as  stated  previously. 

Throughout  the  derivation  of  the  method  it  has  been 
assumed  that  the  system  h(n)  does  not  have  zeros 


on  the  unit  circle,  since  then  II(k)  =  0  for  some  k. 
However,  if  C§(k,l)  =  0  for  some  (k,l),  then  we  can 
change  the  spacing  between  samples,  or  equivalently 
re-estimate  the  bispectrum  in  a  different  grid  of  fre¬ 
quency  points  to  surpass  that  problem,  as  suggested  in 
[1]  and  [8]. 

The  procedure  outlined  in  this  Section  is  valid  for 
any  pair  of  bispectrum  slices,  subject  to  the  condi¬ 
tion  given  in  Proposition  1.  Therefore,  by  using  dif¬ 
ferent  pairs  of  slices,  we  can  average  the  reconstructed 
systems  in  the  time-domain  (after  scaling  and  shifting 
them  appropriately),  thus  reducing  the  effects  of  noise 
and  finite  data  lengths  in  the  estimation  of  cumulants. 


4.  Simulations 

In  this  Section  we  demonstrate  the  performance  of 
the  proposed  method  and  compare  it  to  the  methods  of 
[1]  (BLW)  and  [8]  (RG)  for  the  reconstruction  of  ban- 
dlimited  systems.  The  BLW  and  RG  methods  use  sev¬ 
eral  bispectrum  slices  within  the  bispectrum  principal 
domain  and  are  applicable  to  the  reconstruction  of  ban- 
dlimited  systems.  Since  the  RG  method  was  developed 
for  the  reconstruction  of  real  systems,  comparisons  are 
performed  for  a  real  system. 

We  implemented  all  methods  using  the  bispectrum 
of  the  system  output  for  simplicity.  We  generated  the 
input  process  e(w)  as  an  i.i.d.  sequence  with  zero  mean 
and  nonzero  skewness,  and  added  zero-mean,  Gaussian 
noise  to  the  output  of  the  system,  at  various  signal- 
to-noise  ratios  (SNRs).  The  bispectrum  of  the  out¬ 
put  signal  was  estimated  using  the  indirect  method. 
Data  of  length  L  were  segmented  into  non-overlapping 
records  of  length  256  symbols.  The  third-order  cumu¬ 
lants  of  each  record  were  estimated  in  a  square  grid  of 
(2M  -  1)  X  (2M  -  1)  lags,  and  then  averaged  over  all 
^  records.  Finally,  a.  N  x  N  two-dimensional  FFT 
was  applied  on  the  averaged  cumulants  to  obtain  the 
discrete  bispectrum  of  the  output  process. 

Although  the  system  considered  was  real- valued,  we 
used  two  slices  for  the  reconstruction  procedure  instead 
of  only  one  slice  (see  Proposition  1),  since  an  FFT  size 
of  iV  =  64,  a  power  of  2,  was  used,  in  order  to  speed 
up  computations.  The  reconstruction  procedure  was 
repeated  using  several  different  pairs  of  slices,  and  the 
estimated  systems  were  averaged  in  the  time-domain, 
in  order  to  reduce  noise  effects. 

The  comparison  method  BLW  determines  the  fre¬ 
quency  response  of  the  unknown  system  except  for  a 
linear-phase  term,  where  ^{u)  is  the  phase  re¬ 

sponse.  Although  this  is  not  a  drawback,  it  creates 
representation  problems  for  the  time-domain  recovered 
signal  h(n)  if  ^(1)  is  not  an  integer,  since  it  then  cor- 
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responds  to  a  non-integer  time-delay.  To  circumvent 
that  problem,  we  supplied  the  correct  value  of  to 
the  BLW  algorithm. 

We  generated  a  bandpass  system  as: 

h{t)  =  0.771*1  cos(27r0.49<)+0.8(0.65)l‘lsm(27r0.38f+J) 

0 

where  —4.5  <  i  <  i  seconds.  A  discrete-time  signal 
h{n)  of  length  16  symbols  was  generated  by  sampling 
h(t)  every  0.5  seconds.  Since  the  true  length  of  the 
signal  h{n)  is  assumed  unknown,  we  used  M  =  20  in 
the  computation  of  third-order  cumulants. 

Although  any  pair  of  slices,  as  soon  as  it  satisfies 
the  identifi ability  conditions,  should  provide  identical 
results,  it  was  found  in  practice  that  some  slices  pro¬ 
vide  better  results  than  others.  Let  us  define  the  term 
“frequency  content”  as 

r  \ci{w,i)\du^ 

J  —  TT 

Based  on  our  experience  with  simulations,  we  found 
that  the  use  of  bispectrum  slices  with  low  frequency 
content  resulted  in  poor  performance;  the  opposite  was 
also  found  to  be  true.  In  order  to  select  the  slices  for 
better  reconstruction,  we  run  100  different  simulations 
and  computed  the  frequency  content  of  each  slice  at 
each  run.  The  average  frequency  content  over  all  runs, 
is  shown  in  Fig.  1  where  the  shaded  area  indicates  stan¬ 
dard  deviation.  It  can  be  seen  that  slices  11-18  exhibit 
a  consistently  higher  frequency  content  than  all  others. 

Then  we  run  100  simulations  of  the  proposed 
method,  using  slices  13-14  (corresponding  to  “good” 
slices)  and  23-24  (corresponding  to  “bad”  slices),  and 
the  results  are  shown  in  Fig.  2  for  L  =  2048  output 
samples  and  SNR  equal  to  00  and  10  dB.  Clearly,  the 
use  of  slices  13-14  produces  superior  results,  both  in 
bias  and  variance,  compared  to  those  of  slices  23-24. 

Next,  we  run  100  simulations  of  the  proposed 
method,  using  averaging  over  slices  10-17  (in  pairs  of 
consecutive  slices),  and  of  methods  BLW  and  RG.  The 
results  are  shown  in  Figs.  3  and  4  for  SNR  of  00  and 
10  dB  respectively.  In  both  figures,  graphs  (a),  (b) 
and  (c)  correspond  to  L  =  1024,  while  (d),  (e)  and 
(f)  to  L  :=  2048  output  samples  used.  The  graphs  on 
the  left,  center  and  right  correspond  to  the  proposed, 
BLW  and  RG  methods  respectively.  It  can  be  seen  that 
method  RG  performs  better  than  BLW  both  in  terms 
of  bias  and  variance,  but  both  methods  are  outper¬ 
formed  by  the  proposed  one.  This  can  be  attributed 
to  the  fact  that  the  actual  system  h{n)  is  bandpass, 
therefore  its  output  bispectrum  contains  regions  of  low 
magnitude,  where  the  useful  signal  information  is  sig¬ 
nificantly  corrupted  by  noise.  The  inclusion  of  such 


regions  in  the  reconstruction  procedure  is  responsible 
for  poor  performance.  On  the  other  hand,  selection  of 
regions  with  higher  signal  information  only,  as  in  the 
proposed  method,  leads  to  better  results. 


FREQUENCY  CONTENT 


SLICE  INDEX 

Figure  1.  Frequency  content  for  slices  1- 
33  of  the  output  bispectrum  of  the  system. 
Circles  and  solid  line  represent  the  average 
over  100  simulations,  while  shaded  area  in¬ 
dicates  sample  standard  deviation. 


5.  Conclusions 

A  non-parametric  method  for  system  reconstruction 
based  on  HOS  slices  was  presented.  It  was  shown  that 
an  arbitrary  system  can  be  uniquely  identified  based 
on  any  two  slices  of  the  output  discretized  n-th  order 
spectrum,  with  n  >  3,  as  long  as  the  distance  between 
the  slices  and  the  grid  size  satisfy  a  simple  condition. 
By  using  the  logarithm  of  two  n-th  order  spectrum 
slices,  the  logarithm  of  the  frequency  response  of  the 
system  was  obtained  as  the  solution  of  a  linear  system 
of  equations.  A  mechanism  to  select  slices  that  re¬ 
sult  in  improved  system  estimates  was  proposed.  Sim¬ 
ulation  examples  confirmed  the  superiority  of  the  pro¬ 
posed  method  as  compared  to  existing  schemes,  when 
tested  on  bandlimited  systems.  The  flexibility  of  the 
proposed  method  in  selecting  the  slices  to  be  used  for 
reconstruction  could  allow  one  to  avoid  regions  of  the 
n-th  order  spectrum  where  the  useful  system  informa¬ 
tion  is  limited  or  distorted  by  noise. 
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Proposed  Method 


BLW  Method 


RG  Method 


SLICES:  13-14 


SLICES :  23-24 


Figure  2.  Results  of  the  proposed  method 
using  slices  13-14  and  23-24  of  the  output 
bispectrum  with  SNR  oo  dB  ((a)  and  (b)), 
and  SNR  10  dB  ((c)  and  (d)),  and  L  =  2048 
samples.  Actual  system  is  in  solid  lines,  the 
average  over  100  estimates  in  dash-dotted 
lines,  and  shaded  area  indicates  standard 
deviation. 
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Figure  3.  Comparison  of  the  proposed, 
BLW  and  RG  methods  for  L  =  1024  ((a), 
(b)  and  (c)),  and  L  =  2048  output  sam¬ 
ples  ((d),  (e)  and  (f)),  and  SNR=oo  dB. 
Actual  system  is  in  solid  lines,  the  aver¬ 
age  over  100  estimates  in  dash- dotted  lines, 
and  shaded  area  indicates  standard  devia¬ 
tion. 


Figure  4.  Comparison  of  the  proposed, 
BLW  and  RG  methods  for  L  =  1024  ((a), 
(b)  and  (c)),  and  L  =  2048  output  sam¬ 
ples  ((d),  (e)  and  (f)),  and  SNR=10  dB. 
Actual  system  is  in  solid  lines,  the  aver¬ 
age  over  100  estimates  in  dash-dotted  lines, 
and  shaded  area  indicates  standard  devia¬ 
tion. 
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Abstract 

^  A  method  is  presented  for  blind  system  identifi- 
cation  in  an  impulsive  environment,  where  the  system 
output  is  described  by  a  symmetric  a^siablefSaS)  law. 
The  method  employs  either  the  phase  or  the  magnitude 
of  the  recently  proposed  a-Spectrum  [1]  of  the  system 
output.  It  is  much  simpler  than  the  method  proposed  in 
[1]  that  also  relies  on  the  phase  or  magnitude  of  the  a- 
Spectrum,  and  provides  the  system  cepstrum  via  closed 
form  expressions. 

1,  Introduction 

The  Gaussian  distribution  is  frequently  used  in  sig¬ 
nal  processing  to  model  data.  In  most  cases  it  is  a  rea¬ 
sonable  assumption  and  can  also  be  justified  by  central 
limit  theorem.  Many  signals  and  noises  encountered 
in  the  real  world,  however,  are  non-Gaussian.  This 
paper  considers  non-Gaussian  phenomena  that  can  be 
characterized  as  impulsive,  such  as  lightening  in  the 
atmosphere,  switching  transients  in  power  lines,  car 
ignitions,  accidental  hits  in  telephone  lines,  underwa¬ 
ter  acoustic  signals  [5]-[7].  Signals  in  this  class  exhibit 
sharp  spikes  or  bursts  corresponding  to  values  that  sig¬ 
nificantly  deviate  from  the  process  mean.  This  class  of 
signals  can  be  described  by  an  a —stable  (0<  ot  <2) 
model  [4].  The  main  difference  between  the  stable  and 
the  Gaussian  densities  is  that  the  tails  of  first  are  heav¬ 
ier  than  those  of  the  second.  Frequently  used  mem¬ 
bers  of  the  stable  family  are  the  Gaussian  (a=2)  and 
Cauchy  (o^=l)  distributions. 

The  problem  of  modeling  SocS  processes  has  recently 
received  attention  [9],[1].  The  SaS  process  is  modeled 
as  the  response  of  a  LTI  system  excited  by  white  sym¬ 
metric  a-stable  noise.  In  [1]  the  notion  of  a-Spectrum 

^This  work  was  supported  by  NSF  under  grant  MIP-9553227 


has  been  introduced  and  used  to  identify  the  system 
frequency  response  based  on  the  system  output  only. 
In  this  paper  we  propose  new  methods  for  system  iden¬ 
tification  based  on  a-Spectrum  of  the  system  output, 
that  rely  on  a  simple  relationship  between  the  cepstrum 
of  the  system  impulse  response  and  the  inverse  Fourier 
transform  of  the  phase,  or  the  log-magnitude  of  the 
a-Spectrum.  The  main  advantage  of  the  proposed  ap¬ 
proach  over  the  ones  in  [1]  is  simplicity  of  expressions 
and  computations,  which  renders  them  less  susceptible 
to  computation  errors. 

2.  Mathematical  Background 

Stable  distributions  are  the  only  class  of  distribu¬ 
tions  that  can  be  the  limit  distributions  of  sums  of  i.i.d 
random  variables  (Generalized  central  limit  theorem). 
Their  density  function  does  not  have  a  closed  form, 
thus  they  are  usually  described  by  their  characteristic 
function: 

^{u})  =  E{exp{jujx)} 

exp{jauj  -  7|w|"(l  -  j/d^tan^)) 

expiJau)  -  t|w|“(1  -i/3f  |^/n|w|)) 

where  a  G (0,2]  is  the  characteristic  exponent,  j3  G[-1,1] 
is  the  symmetry  index,  7  >0  is  the  spread  parameter, 
and  a  is  the  location  parameter.  A  symmetric  a-stable 
SaS  process  is  described  by  the  characteristic  function 
given  in  (1),  where  /?  =  0. 

A  standard  SaS  processes  is  a  SaS  process  with 
7  =  1.  A  complex  random  variable  is  SaS  if  its  real 
and  imaginary  part  are  jointly  SaS.  An  important 
complex  SaS  random  variables  is  the  isotropic  com¬ 
plex  SaS  random  variable  with  characteristic  function 
$(wi,a;2)  =  exp{-j{wl+ul)i). 


a  ^  1 

(1) 

a  =  1. 
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a-stable  processes  have  finite  moments  of  order  p, 
where  —1  <p<  a.  For  processes  with  a  <  2  second- 
or  higher-order  moments  are  not  defined.  Fractional- 
order  moments,  however,  exist  and  their  definitions  can 
be  found  in  [4].  Similarly,  for  such  process  the  covari¬ 
ance  is  not  defined.  Its  fractional-order  equivalent,  the 
covariation  is  defined  (only  for  stable  processes  with 
a  >  1)  and  its  definition  can  be  found  in  [4].  The 
covariation  estimator  of  any  two  jointly  SaS  random 
variables  X  and  Y  is  [1]: 

where 

v(p)  -  /  ^  complex-,  .  , 

“  \  \Y\Psign{Y),  Y  is  real,  ^  ^ 

and  [Y,Y]a  equals  the  dispersion  of  Y,  when  Y  is  real 
or  isotropic  complex,  while  it  equals  some  unknown  real 
number  in  all  other  cases  for  Y  [11]. 

The  following  are  some  properties  of  the  covariation: 

1)  linearity  exists  for  the  first  term:  if  Xi,  X2,  Y  are 
jointly  SaS,  then 

[aXi  +  6X2,  Y]«  -  a[Xi,  Y],  +  6[X2,  Y]«.  (4) 

2)  pseudo-linearity  exists  for  the  second  term:  if  Yi,  Y2 
are  independent  and  X,  Yi,  Y2  are  jointly  SaS,  then 

[X,aYi  +  bY2]o  =  +  (5) 

3)  if  X,  Y  are  independent,  then  [X,  Y]cy  =  0. 

3.  Blind  System  identification 

Let  y{n)  be  the  output  of  a  LTI  system  excited  by  an 
i.i.d.  real  or  isotropic  complex  standard  SaS  random 
input  x{n).  Let  /i(n),  H{z)  be  the  system  impulse 
response  and  transfer  function,  respectively. 

The  a— Spectrum  of  the  system  output  was  defined 
in  [1]  to  be  the  covariation  between  y{n)  and  a  function 
of  p(n),  i.e., 

=  [y(n),tt)^(n)]a,  (6) 

where 

i=q 

«=-? 

with  q  being  the  length  of  the  FIR  approximation  of 
h{n). 


In  [1]  it  was  also  shown  that  the  a-Spectrum  is  re¬ 
lated  to  the  system  transfer  function  as: 


Saiz)  = 

|F(z)|“ 


H{z) 


(8) 


Evaluating  the  a-Spectrum  Sa{z)  on  the  unit  circle 
(|z|  =  1)  yields 


from  where  we  get  the  system  magnitude  response  only. 

In  order  to  recover  the  channel,  we  need  to  evaluate  the 
a— Spectrum  on  circles  other  than  the  unit  circle,  i.e., 
on  a  circle  with  radius  r  (r  7^  1).  Taking  the  logarithm 
of  both  sides  of  (8)  yields: 

log5a(re-’’“)  =  log|5a(re'’")| +  yarfif{5a(re^")} 

=  log\H{r^-^e^'^)\  +  (a  -  l)log\H{re^'^)\ 

+j(^(r^-en  -  Hren),  (10) 


where  <f){oj)  is  the  system  phase  response.  In  order 
to  ensure  the  existence  of  complex  cepstrum  of  the 
a-Spectrum,  r  and  must  be  taken  in  the  range 

{Rmin,Rmax)y  where  Rmin  and  Rmax  denote  respec¬ 
tively  the  location  of  the  zeros  with  the  minimum  and 
the  maximum  magnitude  . 

Let  /i(n),/2(n)  denote  respectively  the  inverse 
Fourier  transform  of  the  phase  of  a— Spectrum  eval¬ 
uated  on  circles  with  radii  ri  and  r2.  Let  us  define: 


fi{n)  =  F  ^{ar^{5a(re-'“)}} 

=  F-\lm{logH{rl-^e^'^)H*{rie^‘^)}} 

ai{n)h{n)  +  bi(n)h{-n)  -  _  ^  2  (11) 

2j 

where 

a,(n)  =  (12) 

6i(n)  =  (13) 

and  h{n)  denotes  the  complex  cepstrum  of  the  system 
impulse  response.  In  (11)  it  was  used  the  fact  that 
the  complex  cepstrum  corresponding  to  H{re^^)  equals 
r“”/i(n). 

Similarly,  we  define: 


9i{n)  = 


F  ^{log\So,{re^'^)\} 

E-'{/o</|/f(r'-“el")|  +  (a  -  l)log\H{r^'^)\) 
a'i(n)h{n)  +  b'i{n)h{-n)  .  ,  „  ...x 
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where 


a;.(n)  =  r(“-^)”  +  (a-l)rr  (15) 

Kin)  =  rp-“>  +  (a-l)rf,  (16) 


Thus,  the  cepstrum  of  the  system  can  be  reconstructed 
using  the  a-Spectrum  phase  by: 


h{n) 


^j[hin)fiin)  -  bi{n)f2in)]  ^  /or 
ai{n)b2{n)  -  a2{n)bi{n) 


or  using  the  a-Spectrum  magnitude  by: 


n  /  0, 

(17) 


h{n) 


‘^[b2{n)gi{n)  -  b[{n)g2in)] 
a[{n)b'2{n)  -  a'2{n)b[{n) 


for  n  7^  0,  (18) 


If  a  =  2  the  denominator  of  both  (17)  and  (18)  be- 
comes  zero.  Thus,  for  o  =  2  system  identification  is 
not  possible,  which  comes  at  no  surprise  since  in  this 
case  the  system  input  is  a  Gaussian  process.  It  can  be 
verified  that  when  a  7^  2,  the  denominator  of  (17)  is 
different  than  zero  for  any  value  of  n,  and  for  any  posi¬ 
tive  rl  and  r2  not  equal  to  1,  as  long  as  ri  7^  r2.  Thus, 
reconstruction  from  the  a-Spectrum  phase  can  be  car¬ 
ried  out  based  on  any  two  circles  with  radii  r*i  and  r2 
(n  7^  r2).  For  q;  >  1  the  denominator  of  (18)  does 
become  zero  depending  on  the  values  of  ri,  r2  and  n. 
It  can,  however,  be  verified  that  it  is  nonzero  for  every 
n  ,  as  long  as  ri  ^  r2, 

Based  on  its  complex  cepstrum,  h{n)  can  be  recov¬ 
ered  within  a  scalar  and  a  delay  based  on  inverse  cep¬ 
strum  operations. 


Implementation  Issues 


•  The  covariation  needed  in  the  estimation  of  the 
a— Spectrum  is  obtained  as: 


[y{n),Wz{n)]a 


E{y{n)w^{n)''P 

E{\w,{n)\P} 


[w;^(n),  w^(n)]a. 


(19) 

In  the  case  of  a  real  SaS  input,  Wz(n),  in 
general,  is  not  real  nor  isotropic  complex,  thus 
[wz{n),Wz{n)]ci  cannot  be  estimated  [11].  In 
such  a  case,  although  reconstruction  from  the  a- 
Spectrum  magnitude  would  not  be  possible,  re¬ 
construction  from  the  a-Spectrum  phase  would  be 
possible,  since  this  unknown  term  is  real,  thus  does 
not  contribute  to  the  phase  of  the  a-Spectrum  re¬ 
quired  in  (17). 


•  The  fractional  moment  order,  p,  needed  in  the  co¬ 
variation  estimate,  can  be  taken  to  be  anywhere 
in  the  range  (0,  a).  However,  it  was  demonstrated 
in  [10]  that  when  p  is  in  the  range  (1/2,  a/2),  (19) 
leads  to  better  covariation  estimates. 


•  When  the  length  required  in  (7),  is  not  known, 
a  length  overestimate  can  be  used. 

•  For  the  computation  of  the  phase  of  the  a- 
Spectrum  phase  unwrapping  is  required.  The 
time  domain  equivalent  of  the  a-Spectrum  is  real, 
hence,  phase  unwrapping  can  be  carried  out  as 
usual. 

•  In  practice,  we  should  choose  rl  and  r2  that  are 
close  to  1  unless  we  have  some  priori  knowledge 
about  the  positions  of  poles  and  zeros  of  the  sys¬ 
tem. 

4.  Simulation  results 

In  this  section  we  demonstrate  the  validity  of  the 
algorithm  proposed  in  this  paper.  The  channel  was 
taken  to  be: 

H{z)  =  1.5924z-^(l  -  (0.5462 -h  0.3758j>““i) 

(1  --  (0.5462  -  0.3758i)z-^)(l  -f-  0-628z),  (20) 

which  corresponds  to  a  nonminimum  phase  FIR  system 
of  order  3. 

Two  cases  of  input  were  considered,  complex 
isotropic  and  real.  In  both  cases  the  input  was  taken 
to  be  white  i.i.d.,  5,000  sample  long,  generated  based 
on  a  =  1.5  and  =  I .  The  system  reconstruction 
was  performed  based  on  20  independent  input  realiza¬ 
tions.  The  covariation  estimator  was  obtained  based  on 
p  =  0.6  for  the  first  input  and  p  =  0.7  for  the  second 
input,  a  was  taken  to  be  equal  to  its  true  value,  and 
the  length  of  the  FIR  approximation  of  h{n)  needed  in 
(7)  was  taken  to  be  g  =  10.  The  radii  of  the  two  circles 
are  ri  =  1.2  and  r2  =  The  estimation  results  of  the 
20  Monte  Carlo  simulations  for  the  complex  isotropic 
input  are  shown  in  Fig.  1,  while  those  corresponding 
to  the  real  input  are  shown  in  Fig.  2. 

Another  set  of  simulations  was  performed  based  on 
the  estimated  or.  In  each  Monte  Carlo  run,  a  was  esti¬ 
mated  via  the  method  of  [1] .  The  estimated  phase  and 
magnitude  response  based  on  20  Monte  Carlo  runs  are 
shown  in  Fig.  3.  The  mean  of  the  estimated  a  was 
1.3946. 
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Abstract 

The  problem  of  estimating  the  parameters  of  a  non- 
causal  ARMA  system,  driven  by  an  unobservable  input 
noise  is  addressed.  We  propose  a  method  based  on  a  gen¬ 
eralized  version  of  the  prediction  error  minimum  variance 
approach  and  on  the  maximum  kurtosis  properties.  Firstly, 
a  spectrally  equivalent  (SE)  model  is  identified  with  the  gen¬ 
eralized  minimum  variance  approach.  Secondly,  the  kurto¬ 
sis  allows  us  to  identify  the  phase  of  the  true  model  by  lo¬ 
calizing  its  zeros  and  poles  from  the  SE  model.  Finally,  we 
propose  a  new  method  which  is  a  closed-loop  form  of  the 
preceding  method  allowing  to  improve  the  accuracy  of  the 
parameter  estimation  and  to  obtain  a  better  reconstruction 
of  the  estimated  model  phase.  Simulations  results  seems  to 
confirm  the  good  behavior  of  the  proposed  methods  com¬ 
pared  to  methods  using  higher  order  statistics. 

1.  Introduction 

This  paper  deals  with  parameter  identification  of  a  non- 
causal  stable  linear  time  invariant  (LTI)  system  driven  by  an 
unknown  zero-mean,  stationary,  non-gaussian  white  noise 
(NGWN).  Many  methods,  using  principally  higher-order 
cumulants,  were  proposed  in  the  literature  [8],  [3],  [7],  [5]. 
Here,  we  are  interested  in  a  method  proposed  by  Boumahdi 
[1],  which  uses  the  second-order  statistics  (SOS)  informa¬ 
tion  and  the  maximum  kurtosis  property  to  identify  non- 
causal  MA  or  AR,  or  causal  ARMA  models.  The  inter¬ 


est  of  this  approach  is  that  it  provides  parameter  estimates 
with  lower  variance  than  if  higher-order  statistics  (HOS)  are 
used. 

We  propose  to  extend  this  method  to  non-causal  ARMA 
models  and  to  improve  the  estimation  accuracy  of  this 
method  in  the  sense  of  the  error  covariance  matrix  of  the 
parameter  estimate.  We  identify  firstly  the  spectrally  equiv¬ 
alent  (SE)  minimum  phase  model  by  using  SOS,  then  we 
use  the  maximum  kurtosis  property  to  recover  the  phase  of 
the  true  model  (non-causal  MA,  AR  or  ARMA  model).  But 
our  approach  differs  from  the  one  of  Boumahdi  [1]  by  the 
fact  that  we  use  a  prediction  error  method  (PEM)  to  iden¬ 
tify  the  SE  minimum  phase  model.  Effectively,  we  propose 
a  generalization  of  the  minimum  variance  approach  to  the 
case  of  non-causal  ARMA  models  by  introducing  a  model 
separating  the  causal  and  anti-causal  parts.  And  finally,  to 
improve  the  estimation  accuracy,  we  introduce  a  recurrence 
on  the  two  steps  of  variance  minimization  and  kurtosis  max¬ 
imization,  this  being  based  on  a  simple  idea. 

We  present  in  section  2  the  ARMA  models  and  the  hy¬ 
potheses  used.  In  section  3,  the  minimum  variance  ap¬ 
proach  is  detailed.  In  section  4,  the  maximum  kurtosis  prop¬ 
erty  is  presented.  We  proposed  a  new  identification  scheme 
in  section  5.  Finally,  simulation  results  are  summarized  in 
section  6.  In  section  7,  a  conclusion  is  given. 

2.  ARMA  models  and  hypotheses 

The  observed  process  y  =  {2/[t^]}nGZ  is  modeled  as  the 
output  of  a  discrete  non-causal  stable  LTI  system  of  impulse 
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response  (IR)  with  unobservable,  zero-mean,  stationary, 
non-gaussian  white  input  X  =  {a?[n]}nGZ  • 

y  =  h0^^x  (1) 


where  *  is  the  convolution  sum  and  Oq  is  the  actual  parame¬ 
ter  in  the  parameterized  form  of  the  IR  ho  with  ^  transform 
Zhs{z)  defined  by  : 


Za(z)^  l  +  j:^,^,a[{\z-^ 


(2) 


with  e  =  [/?[!] . .  ./3[q]  a[l] . .  .a[p]]  . 

The  IR  he  is  supposed  to  be  stable  with  stable  inverse.  So, 
we  suppose  that  the  ARMA(p,  q)  model,  of  orders  p  and  q 
known,  is  non-causal,  free  of  zero-pole  cancelations,  with 
no  zeros  or  poles  on  the  unit  circle  (UC). 

Let  (P*)i<i<na  ({P^  i)i<»<na)  t*®  the  system  poles  in¬ 
side  (outside)  the  UC  with  Ua  +  ria  =  p,  and  let  (2’,)i<,<„j 
{{^i)i<i<n^  be  the  system  zeros  inside  (outside)  the  UC 
with  nb  +  Tij^  —  q.  We  can  express  Zhg  (z)  (2)  by  : 


=  Gg  ■  ■  Zh'g(z)  (3) 


with  Gg  =  nrii  i-Zi)/ n”=i(-^»)>  -  «5  and 


Zh'g{z) 


YlZriA-Zjz-^)  TlZiA-ZT^^) 
w=iA-Piz-^)  nr=i(i-^"'^) 
Zh'caiz)  ■  Zh'a^{z~A  (4) 


where  Zh!^^{z)  {Zh!^^{z'~^))  is  the  causal  (anti-causal)  part 
of  the  ARMA  model,  ria  and  ria  (rib  and  n^)  denote  the 
number  of  poles  (zeros)  of  Zh!^^(z)  and  Zh’^^{z~^),  re¬ 
spectively. 

Then  the  output  process  y  can  be  modeled  of  equivalent 
manner  to  (1),  as  showed  on  the  figure  1,  by  : 

y  =  *  X  (5) 

where  Zx\z)  =  Ge^  •  z^^^q  •  Zx(z).  x  is  then  a  scaled 
shifted  version  of  x. 

Note  that  this  alternate  representation  (4)- (5)  of  a  non- 
causal  non-minimum  phase  ARMA  system  is  used  in  [6] 
to  develop  maximum  likelihood  estimates  of  ARMA  mod¬ 
els.  Moreover,  we  showed  that  this  modeling  verifies  some 
interesting  properties. 


Figure  1.  Model  of  the  system. 


Indeed,  the  power  spectrum  of  y  is  defined  by  : 

Sy{w)  =  SA^)\ZhW'^t 

=  J  2/.l„(e-’“)  ZK,{e-n  ZhU^-n 


where  5^'  (w)  is  the  power  spectrum  of  the  white  input  x* 
and  v  its  variance. 

We  can  show  easily  the  property  which  follows  : 


j_  r 


logSy{uj)duj  =  logi/ 


(6) 


by  using  the  fact  that  ^  log  (1  +  a  doj  =  0 
for  all  a  G  C  such  that  \a\  <  1,  where  C  denotes  the 
complex  numbers  set.  This  formula  is  simply  demon¬ 
strated  by  using  power  series  expansion  of  the  logarithm. 
It  permits  us  to  prove  that  ^  f^^log Zh'ca(e^^^)dLJ  = 
^  Htc  =  0,  and  to  obtain  the  relation 

(6).  It  is  important  to  note  that  the  integral  of  the  logarithm 
of  the  power  spectral  density  of  y  is  generally  not  equal  to 
the  logarithm  of  the  variance  of  x,  except  in  the  causal  min¬ 
imum  phase  case. 

Now,  by  using  the  property  (6),  we  demonstrate  that : 


E{y'^[n]]>v'  (7) 


where  E  {  [n] }  is  the  variance  of  y . 

Because  we  have : 


/  =logj/' 

1  ’  ^Sl,Zy{u)du;  =E{j/2[n]} 


So,  we  obtain : 


E{j/^W} 

j/' 


du 


Butu— logu— 1  >  Oifu  7^  1  and«— logu— 1  =  Oifu  =  1. 
Like  this,  — —  1  >  0,  hence  E  {y^[n]}  >  u  and  we 
have  E  {y^[7i]}  =  j/'  if  and  only  if  for  all  u;,  Sy{uj)  =  u  , 
i.e.  if  Zh*^^{z)  is  the  transfer  function  of  an  all-pass  filter. 
Note  that  we  cannot  say  anything  about  the  variance  of  the 
input  X. 
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3.  Minimum  variance  criterion 

For  all  parameters  vector  9  and  all  signal  y,  we  de¬ 
fined  the  sequences  e(^)  =  {e(^)W}nez  and  e  {$)  = 
{e'(0)[n]}„gz  on  the  figure  2,  where  is  the  inverse 
of  h'g  for  the  convolution  sum.  The  deconvolution  is  per¬ 
formed  by  forward  and  backward  filtering  (see  [7]). 


Figure  2.  Deconvolution  scheme. 


We  can  prove,  similarly  to  the  property  (7),  that 

E{e'^(6')W}  > 

where  the  equality  is  obtained  if  and  only  if  *  hg’~^  is 
the  impulse  response  of  an  all-pass  filter.  But  we  cannot  say 
anything  about  the  variance  of  e(0),  because  Ge  depends 
on  the  zeros  and  poles  of  he  which  are  outside  the  unit  cir¬ 
cle.  (Obviously,  second  order  methods  are  not  sufficient  to 
identify  non-causal  models). 

Then,  the  minimization  of  the  following  criterion  : 

J(^)=E{e'2(0)[n]}  (8) 

gives  at  least  solutions,  pr  (Pc)  and  qr  (qc) 

being  the  numbers  of  real  (complex)  poles  and  zeros,  re¬ 
spectively,  of  the  true  model  parameterized  by  Oq.  These 
solutions  correspond  to  the  SE  models  (obtained  by  replac¬ 
ing  poles  and  zeros  of  the  actual  model  by  their  conjugated 
inverses).  But  this  is  true  only  if  the  actual  ARMA  model 
does  not  contain  an  all-pass  factor. 


4.  Maximum  kurtosis  property 


Assuming  that  x  is  fourth  order  white  noise,  the  nor¬ 
malized  kurtosis,  noted  74  j',  of  the  prediction  error  e  (^) 
is  given  by : 


74, e' 


Er-oo(/^;n*v^)n^] 


(9) 


where  72  is  the  variance  of  e  (0),  74, e'  kurtosis  (de¬ 
fined  as  the  value  at  the  origin  of  its  fourth  order  cumulant) 


and  74  is  the  normalized  kurtosis  of  x  .  Its  value  is  “a 
distance”  between  the  statistics  of  x’  and  the  gaussianity. 

So,  the  relation  (9)  leads  to  : 

|74,£'l  <  I74,*'I 

if  we  supposed  that  h'0^  and  hg  are  stable  with  stable  in¬ 
verses.  The  equality  is  obtained  if  and  only  if  hg  =  hg^. 

The  relation  (10)  implies  that  all  filtered  version  of  a 
NGWN  is  “more  gaussian”  than  the  noise  itself.  So,  as 
proposed  by  Donoho  [2],  a  mean  for  a  good  non-minimum 
phase  blind  identification  is  to  maximize  the  absolute  value 
of  the  normalized  kurtosis  of  the  estimated  input. 

The  method  proposed  by  Boumahdi  [1]  consists  in  two 
steps.  Firstly,  identification  of  the  SE  minimum  phase 
model  by  using  classical  methods  assuming  causality  and 
minimum  phase.  Then,  the  estimated  model  is  the  one 
which  maximizes  |74_£'|  among  the  SE 

models.  The  interest  of  this  method,  that  we  noted  the 
“maximum  kurtosis”  (MK)  method,  is  that  we  estimate  the 
numerical  values  of  the  non-causal  ARMA  parameters  by 
using  only  SOS,  then  with  lowest  estimation  variance,  the 
HOS  (kurtosis)  being  used  as  criterion  of  choice  among  all 
the  SE  models  to  recover  the  phase  of  the  true  model. 


5.  Improvement  of  the  method 

In  practice,  given  N  consecutive  samples  of  the  system 
output  {2/[*^]}n€[i,JV]>  we  want  to  estimate  the  true  param¬ 
eters  Oq.  So,  to  estimate  a  SE  model,  we  minimize  the  fol¬ 
lowing  criterion : 

•'(«)= I 

Theoretically,  the  knowledge  of  the  minimum  of  (8)  corre¬ 
sponding  to  a  SE  model,  allows  us  to  determine  the  min¬ 
ima  of  (8)  corresponding  to  the  others  SE  models.  But  it 
is  false  in  practice  with  the  criterion  (11),  due  to  the  finite 
signal  length.  The  principle  of  the  improvement  of  the  MK 
method  is  to  minimize  again  the  criterion  (1 1),  in  the  direc¬ 
tion  of  the  minimum  for  which  the  phase  correspond  to  the 
true  model  (after  maximization  of  the  kurtosis).  It  leads  to 
the  following  algorithm : 
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-  Initialization:  Minimization  of  J{0)  (convergence  to  a  lo¬ 
cal  minimum). 

-  Recurrence  on  the  two  following  steps  : 

a)  determination  of  the  model  phase  by  maximization  of 
174^^/ 1  among  all  the  SE  models, 

b)  minimization  of  the  criterion  (11)  with  initialization  by 
the  model  maximizing  1. 

We  stop  the  recurrence  when  we  obtain  the  same  result 
on  two  successive  iterations.  This  recurrence  allows  to  ob¬ 
tain  a  better  reconstruction  of  the  phase  of  the  estimated 
model  and  to  improve  parameter  estimation  accuracy,  com¬ 
pared  to  the  method  MK  and  those  given  in  [1].  Note  that 
this  method  is  nothing  but  the  maximum  likelihood  estima¬ 
tion  method  under  gaussian  hypothesis,  which  gives  local 
minima  around  all  the  models  SE  to  the  actual  one.  The 
maximization  of  the  kurtosis  permits  to  determine  the  ac¬ 
tual  phase  model.  Hence,  the  covariance  matrix  of  parame¬ 
ters  estimates  can  be  calculated  using  formula  found  in  [6]. 
The  main  disadvantage  of  the  “improved  maximum  kurto¬ 
sis”  (IMK)  method  is  the  impossibility  to  identify  ARMA 
models  having  all-pass  factor. 

6.  Simulation  results 

In  this  section,  we  present  three  simulation  examples  to 
illustrate  the  proposed  method  IMK  and  to  compare  it  with 
others  methods  proposed  in  the  literature.  To  reduce  re¬ 
alization  dependency  of  the  simulations,  we  averaged  over 
100  independent  Monte  Carlo  runs  (MCR).  The  input  white 
noise  x  used  is  exponentially  distributed  (72, ^  =  1  and 
73, ar  =  2).  Tables  1,  2  and  3  display  the  estimated  pa¬ 
rameters  (mean  and  one  standard  deviation  (Std))  for  each 
method. 

Example  1  :  We  consider  the  following  true  MA(5) 
model  [8] : 

ZhB^{z)  =  l  +  O.l  z-^ -1.S7  z-^ +  Z.02z-^ 

-  1.435  + 0.49 

with  zeros  located  at  0.25  ±  jO.433, 0.7  ±  jO.7  and  —2. 

The  signal  length  used  is  TV  =  1024  samples.  The 
method  IMK  is  compared  to  the  methods  presented  in  [8], 


[3]  and  [1].  The  methods  of  [8]  and  [3]  are  “linear  algebra 
solutions”  and  use  different  slices  of  second  and  third  order 
cumulants.  Table  1  shows  the  simulation  results.  We  can 
see  that  the  IMK  method  gives  estimates  having  lower  bias 
and  variance  than  the  other  methods. 


Table  1.  Parameter  estimates  :  Example  1, 
100  MCR,  N=1024,  mean,  Std. 


m 

^[2] 

m 

p[5] 

True 

value 

0.100 

-1.870 

3.020 

-1.4350 

0.490 

[8] 

Mean 

Std 

-0.2684 

0.5212 

-0.7191 

0.7051 

1.1103 

0.8711 

-0.3138 

0.8518 

0.0246 

0.4240 

[3] 

Mean 

Std 

-0.2897 

0.6067 

-1.0359 

0.9641 

1.9145 

1.0694 

-0.9173 

0.5525 

0.3283 

0.3172 

[1] 

Mean 

Std 

0.1405 

0.5171 

-1.8038 

0.9751 

1  2.8593 

1.6299 

-1.2613 

1.6843 

0.4531 

0.5440 

IMK 

Mean 

Std 

0.1404 

0.2288 

-1.9367 

0.3593 

3.0718 

0.2865 

-1.4375 

0.1038 

0.4860 

0.0650 

Example  2  :  Let  the  following  non-causal  AR(3)  model 
[7]: 


°  ^  1  -  2.05  z-i  +  1.65  -  0.8125  z-^ 

with  poles  located  at  0.4  ±  jO.7  and  1.25. 

The  signal  length  used  is  TV  =  256  samples.  The  method 
IMK  is  compared  to  the  method  presented  in  [1]  and  [4, 
Theorem  2],  this  last  using  third  order  cumulants.  Table  2 
displays  the  simulation  results.  We  can  see  that  the  IMK 
method  has  lower  variance  than  the  other  methods  and  the 
method  of  Boumahdi  [1]  is  better  than  those  of  [4],  which 
uses  only  HOS. 

Example  3  :  The  true  non-causal  ARMA(2,1)  model  [5] 
used  is  given  by  : 

cTj.  /  \  _  .  1  “  0.5 

1  -  0.5  0.9375  ^“2 

with  two  poles  located  at  —0.75  and  1.25,  and  a  zero  at  0.5. 

The  signal  length  used  is  TV  =  512  samples.  The  method 
IMK  is  compared  to  the  method  MK  (version  of  [1]  allow¬ 
ing  identification  of  non-causal  ARMA  models).  The  sim¬ 
ulation  results  are  presented  in  Table  3.  We  can  see  that  the 
IMK  method  provides  estimates  with  lower  variance  and 
bias  than  the  method  MK. 
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Table  2.  Parameter  estimates  :  Example  2, 
100  MCR,  N=256,  mean,  Std. 


a[t\ 

a[l] 

a[2] 

o[3] 

True  value 

-2.050 

1.650 

-0.8125 

[4] 

Mean 

Std 

-1.5618 

0.1353 

0.9837 

0.1778 

-0.3444 

0.1158 

[1] 

Mean 

Std 

-2.0782 

0.1547 

1.6673 

0.1501 

-0.8192 

0.0959 

IMK 

Mean 

Std 

-2.0743 

0.1500 

1.6741  ’ 

0.1467 

-0.8229 

0.0947 

Table  3.  Parameter  estimates  :  Example  3, 
100  MCR,  N=512,  mean,  Std. 


a[l] 

a[2] 

m 

True  value 

-0.500 

-0.9375 

-0.500 

MK 

Mean 

-0.5315 

-0.9625 

-0.4829 

Std 

0.1078 

0.1028 

0.0837 

IMK 

Mean 

-0.5309 

-0.9594 

-0.4908 

Std 

0.0998 

0.1000  ’ 

0.0844 

The  simulation  results  show  that  the  IMK  method  (MK 
method  improved)  ameliorates  the  accuracy  of  the  parame¬ 
ter  estimation  facing  to  the  MK  method.  Because  the  im¬ 
provement  (see  section  5)  permits  to  obtain  the  good  min¬ 
imum  of  the  criterion  (11)  and  to  decrease  the  rate  of  false 
reconstruction  of  the  phase. 

7.  Conclusion 

We  proposed  in  this  paper  a  new  method  (IMK)  to  iden¬ 
tify  non-causal  ARMA  models,  which  is  an  extension  of  the 
Boumahdi  method  [1],  using  the  maximum  kurtosis  prop¬ 
erty.  This  extension  is  based  on  a  generalization  of  the  min¬ 
imum  variance  approach  to  the  non-causal  case.  We  have 
showed  that  it  is  necessary  to  introduce  an  ARMA  modeling 
with  separation  of  causal  and  anti-causal  parts.  Then,  the 
maximum  likelihood  estimator  (under  gaussian  hypothesis) 
has  several  local  minima  corresponding  to  all  the  models 
SE  to  the  actual  one.  Finally,  we  used  the  maximum  kur¬ 
tosis  property  to  identify  the  phase  of  the  true  model  from 
one  of  these  local  minima  (MK  method). 


Then,  we  have  presented  the  method  IMK  which  con¬ 
sists  in  a  closed-loop  form  of  the  MK  method.  The  recur¬ 
rence  introduced  in  the  method  IMK  permits  to  improve  the 
accuracy  of  the  parameter  estimation  and  to  obtain  a  bet¬ 
ter  reconstruction  of  the  phase  of  the  estimated  model,  in 
comparison  with  the  MK  method.  Moreover,  for  the  IMK 
method,  the  covariance  matrix  of  parameters  estimates  is 
known  and  can  be  calculated  with  the  formula  found  in  [6]. 

The  problem  is  that  this  method  cannot  identify  ARMA 
models  with  all-pass  factors.  But  the  simulation  results 
showed  the  good  performances  of  the  method  IMK,  com¬ 
pared  to  methods  using  HOS,  for  the  identification  of  non- 
causal  MA,  AR  and  ARMA  models  without  all-pass  factors. 
Its  interest  is  that  it  uses  all  the  SOS  information  for  identi¬ 
fication,  HOS  being  used  only  as  criterion  of  choice  among 
all  the  SE  models  to  determine  the  model  phase. 
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ABSTRACT 

The  problem  considered  is  that  of  identification  of  unknown 
parameters  of  multivariable,  linear  ” error s~in~variables” 
models,  Le.,  linear  systems  where  measurements  of  both  in¬ 
put  and  output  of  the  system  are  noise  contaminated.  At¬ 
tention  *5  focused  on  frequency- domain  approaches  where 
the  integrated  polyspectrum  (bispectrum  or  trispectrum)  of 
the  input  and  the  integrated  cross-polyspectrum,  respec¬ 
tively,  of  the  given  time- domain  input- output  data  are  ex¬ 
ploited.  We  first  develop  (computable)  expressions  for  the 
covariance  matrix  of  the  system  transfer  function  estimate 
and  show  that  the  system  transfer  function  matrix  estimate 
is  asymptotically  complex  Gaussian.  Then  we  propose  and 
analyze  a  pseudo-maximum  likelihood  (PML)  estimator  of 
system  parameters  using  the  developed  statistics  of  the  sys¬ 
tem  transfer  function  estimate.  Finally  two  simulation  ex¬ 
amples  are  presented. 


1.  INTRODUCTION 

It  is  a  common  practice  in  system  identification  literature 
to  assume  that  measurements  of  the  system  output  are 
noisy  but  the  measurements  of  the  input  to  the  system  are 
perfect.  This  assumption  is  not  necessarily  true  in  system 
and  control  applications  where  the  input  is  not  under  the 
analyst's  (complete)  control;  rather,  it  can  only  be  mea¬ 
sured.  Clearly,  it  may  not  always  be  possible  to  neglect 
the  noise  introduced  by  the  sensor  itself,  or  by  the  ambient 
environment.  In  this  paper  we  propose  to  investigate  multi- 
variable  system  identification  under  “symmetric  modeling” 
[1],[2]  of  stochastic  systems  by  allowing  the  input  measure¬ 
ments  also  to  be  noise  contaminated,  in  addition  to  having 
noisy  output  measurements.  Such  models  are  called  errors- 
in-variables  models  in  the  econometrics  literature.  Past 
work  on  dynamic  system  identification  with  noisy  input  has 
concentrated  overwhelmingly  on  exploitation  of  second  or¬ 
der  statistics  [l]-[4].  It  is  known  that,  in  general,  there  does 
not  exist  a  unique  solution  when  only  second  order  statis¬ 
tics  are  used.  Use  of  higher-order  statistics  can  alleviate 
this  problem  at  the  cost  of  higher- variance  estimates  [5]-[9]. 
Our  recent  work  [9]  dealing  with  single-input/single-output 
models  has  utilized  frequency-domain  approaches  coupled 
with  a  novel  entity,  integrated  polyspectrum,  which  leads 
to  computationally  simpler  and  statistically  more  accurate 
parameter  estimates  than  heretofore  possible  with  higher- 
order  statistics,  including  the  approaches  of  [5]- [8].  In  [10] 
identifiability  and  consistency  issues  for  MIMO  systems 

This  work  was  supported  by  the  National  Science  Founda¬ 
tion  under  Grant  ECS-9504878. 


were  considered.  No  simulation  examples  were  provided 
in  [10]. 

The  objective  of  this  paper  is  threefold: 

•  To  develop  (computable)  expressions  for  the  covari¬ 
ance  matrix  of  the  system  transfer  function  estimate 
(which  is  based  upon  certain  integrated  poly  spectrum 
estimates).  To  show  that  the  system  transfer  function 
estimate  is  asymptotically  complex  Gaussian. 

•  To  propose  and  analyze  a  pseudo-maximum  likelihood 
estimator  of  system  parameters  using  the  developed 
statistics  of  the  system  transfer  function  estimate. 

•  To  provide  computer  simulation  examples. 

2.  MODEL  ASSUMPTIONS 

The  true  system  (i.e.  the  one  generating  the  data)  is  given 
by 

s(f)  =  ‘H(?)u(<)  (2-1) 

where  p— column  vector  s(t)  is  the  system  output,  i  is 
discrete- time,  and  m— column  vector  u(f)  is  the  system  in¬ 
put.  With  q~^  denoting  the  backward-shift  operator  (i.e. 
g'”^u(t)  =  n(i  —  1)),  the  linear  system  W(g)  represents  an 
HR  (infinite  impulse  response)  system: 

oo 

W(5)  =  (2-2) 

t=0 

Noisy  measurements  of  the  system  input  and  output  are 
available  as 

x(i)  =  u(i)  +  vi(<),  y(i)  =  s(t)  +  Vo(f).  (2-3) 

Given  an  input-output  (10)  record  {x(t),y(f),  t  = 
1»2,  •••},  but  the  underlying  true  system  model  'H{q)  un¬ 
known,  it  is  of  much  interest  in  control,  communications 
and  signal  processing  applications  to  fit  a  rational  function 
model 

g(q)  :=  A-\q)B{q)  =  [J  +  £ Ag"']- ^ E  Big-*] 

1=1  i=l 

.  .  (2  -  4) 

to  given  input-output  record.  Define  the  unknown  param¬ 
eter  vector 

e  =  {vec{Ai},  1  <  i  <  na,  Yec{Bj},  l<j<  ny} 

(2  ~  5) 

where  vec  denotes  the  column  stacking  operator  [18].  We 
wish  to  estimate  9  given  certain  statistics  of  the  input- 
output  data  record  {x(t),  y(t),  i  =  1,2,  •••}.  When  it  ex¬ 
ists  true  value  of  0  will  be  denoted  by  . 
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The  following  conditions  are  assumed  to  be  true. 

(ASl)  The  true  system  transfer  function  ?f(g)  is  causal 
and  stable.  Therefore,  11^(011^ 

(AS2)  All  the  processes  involved  (i.e,,  x(t),  y(t), 

and  Vo(t))  are  zero-mean  and  jointly  stationary.  Fur¬ 
thermore,  the  noise  sequences  and  {vo(t)}  are 

independent  of{u(t)},  hence  of  {s(t)}.  The  integrated 
polyspectrum  (defined  in  Sec.  3)  {1^2  or  3) 

of  {u(t)}  is  invertible  for  all  w  €  [0,  tt]  if  the  proposed 
approaches  utilize  the  entire  frequency  range  [0,7r].  If 
a  finite  number  of  frequencies  are  used  then  Sri^u{<^) 
need  be  invertible  only  for  this  frequency  set.  Finally, 
\o(i)  +  'H(q)yi(i)  and  Svv{o^)  is  invertible  whenever 
is,  where  v(t)  :=  Vo(t)  -  ‘W(g)vi(t). 

(ASS)  The  cumulant/  cross-cumulant  sequences  of  the 
various  processes  involved  satisfy  the  following 
summabiHty  conditions: 

oo 

[1 -1- |rj|]lCzi*3*”Zfc(n,  * '  •  <  oo, 

Tl,— ,T|,_i  =  -00 

for  each  j  =  —  1  and  each  k  =  2,3,-*- 

where  Zi{i)  denotes  a  component  of  any  of  the  in¬ 
volved  processes  such  as  ii(t),  s(t),  Vi(t),  etc.  and 
C'z,z3-.z„(Ti,---,T>e-i)  denotes  the  Mh  joint  cumu¬ 
lant  of  the  random  variables  {zi{t  +  n),  •  •  • ,  + 

rfc_i),zfc(t)}. 

(AS4)  The  noise  processes  are  jointly  Gaussian  if  we  ex¬ 
ploit  the  integrated  trispectrum  (defined  in  Sec.  3)  of 
the  data.  The  noise  processes  are  assumed  to  have 
vanishing  bispectrum  (defined  in  Sec.  3)  if  we  exploit 
the  integrated  bispectrum  of  the  data. 

3.  TRANSFER  FUNCTION  ESTIMATOR 

Define  m— column  vectors  r3u(^)  and  r2u(^)  whose  i— th 
components  T3ut(^)  and  r2«i(t),  respectively,  are  given  by 

7*3ui(t)  =  «?(^)  3«i(t)E{Tii  (t)}  —  E{tii(t)},  (3  —  1) 

T2uiit)  =  ti?(t)  -  E{«?(t)},  (3-2) 

where  u{{i)  is  the  i-th  component  of  u(t).  We  will  use  the 
notation  r3aj(t)  and  r2a5(t)  to  denote  the  above  vectors  when 
u(t)  is  replaced  with  x(t).  Mimicking  the  univariate  formic 
lation  of  [9],  we  now  define  a  “component-by-component” 
integrated  bispectrum  of  u(t)  as 

+  (3-3) 

C» 

Similarly  we  have  integrated  trispectrum 

‘S’rs^^w)  =  ^  i;{r3u(i)u’’(i  +  *!)}e“'’“'‘.  (3-4) 

fc=  — OO 

Consider  the  cross-spectrum  {I  =  2,  3) 

‘S'r„v(‘^)  =  ^  E{Tu{t  +  k)y^{t)}e~^'^'‘.  (3-5) 

^=3-^00 


Therefore,  if  S;;i„(w)  exists,  then  we  have  the  transfer 
function  matrix  expression 

n{en  =  lS7,U‘^)Sr,.y{u>)]^  0  =  2,3).  (3-6) 

Given  a  record  of  length  T,  let  X(w)  denote  the  DFT  of 
{x(t),  1  <t<T}  given  by 


T-l 

X(c/k)  =  x(t  +  l)exp(-iwfct)  (3-7) 

t=0 


where 


A  =  0,1,.--,T-1.  (3-8) 

Similarly  define  Y(a;fc).  Also  let  Rib(c«;)  denote  the 
of  {Ttx(t)}  obtained  by  using  the  relations  (recall  (3-1)  and 
(3-2)) 

r3®j(t)  =  ““  3a:j(t)/i2asj,  (3  —  9) 

t=i 

r2xj(t)  =  ®i(^)»  (3-11) 

(j  =  1,2,  ■••,m).  Given  the  above  DFT’s,  foUowing  [11, 
Sec.  7.4]  we  define  the  cross-spectrum  estimators  as 


TIXy 

'  '  i=— ntT* 

(3  -  12) 

TTty 

'  '  i—  —mq> 

(3-13) 

n  light  of  (3-12)  define  a  coarser  frequency  gnd: 


(Ji 


_  M  _  2T/(2mT  + 1) 
T 


(3  -  14) 


;rith  /  =  0, 1,  •  •  • ,  It  1  where  Lt  =  L2mT+TJ- 
Using  the  estimated  spectra  we  have  an  estimator  of  the 
ystem  transfer  function  at  frequency  o/fc  (as  in  [11,  Chapter 

=  Syu(k)S;;J^(k)  (3  -  15) 

provided  that  S^ITuC^)  exists.  Choose  mr  to  be  such  that  as 
^  — ►  oo,  we  have  ttitT  ^  ^  0  and  ttit  ^  oo*  II  ^uu  (^fc) 

xists,  then  it  follows  from  [11,  Thm.  8.11.1]  that 


_  1* _ 


^5“  1  / 1  //n\\ 


=  w.p.l  (3  -  16) 

where  limT-^oo  A!(T)/T  =  /^Convergence  in  (3-16)  is  uni¬ 
form  in  /.  In  addition,  for  k  =  l(2TnT  +  1), 

i  =  0, 1,  •  •  • ,  (Lt /2)  —  1,  are  (asymptotically)  independent. 

Let  ki(T)  with  T  =  1,2,  be  a  sequence  of  integers 
such  that  limT-^oo  fei(T)/T  =  fh  ^  fixed  frequency  (in  H*)- 
We  may  take  these  integers  to  belong  to  the  coarser  gnd 
{k  I  k=:  l(2mT  +  1)  ,  i  =  0, 1, .  •  • ,  (Lt/2)  -  1}.  Under 
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the  cumulant  summability  conditions  stated  in  assumption 
(AS3),  it  follows  from  [11,  Thm.  7.4.1]  that 

limr-+oo-£?'{*S’ri3,»(Ai(T’))}-  =  5'rjj5«(2‘7r/i) ,  (3  ““  17) 

limT->+oo-S'{5rjjjjy(fcj(T))}  =  *5rjjpv(2x/t).  (3  —  18) 

Moreover,  it  follows  from  [11,  Thm.  7.4.3]  that  for  0  < 
/i,/m<0.5 


limr_Kjo  At  cov{[<5rjg,aj(Ai(T))]ij,jj,  [5'rjj„y(A;7Ti(T))]t.,,j3} 

=  —  ’T^)[5ri,ri„(2x/l)]ij,i3[5ya(27r/i)]j3,j^,  (3  —  19) 

HmT-.oo  At  cov{[Sr,.*(Ai(T))]i,  j,,  [5;^y(A:„^(^))] 

=  6{l  —  7n)[5rj,y(2x/f)]t^  j-3[5rj,a!(2x/i)]t3,jj,  (3  —  20) 

where 

At  =  2mT  +  1  (3  —  21) 

cov{A',y}  =  E{XY’^)  --  E{X}E{Y'^},  [A]ij  denotes  the 
»j-th  element  of  matrix  A,  and  other  covariances  may  be 
deduced  in  a  similar  manner.  Convergence  in  (3-17)-(3-20) 
is  uniform  in  /. 

Consider  a  fixed  set  of  M  frequencies  At  for  /  = 
1,2,  such  that  0  <  Ai  <  Aj  <  •  •  •  <  Am  <  tt, 

where  Ai  =  2irfi.  Using  (3-17)-(3-2l)  and  a  perturbation 

expansion  of  yields  (see  also  proofs  of  Thm.  8.7.1 

and  Lemma  P8.1  in  [11,  pp.  449-450]  pertaining  to  power 
spectrum  and  cross-power  spectrum) 

=  {E{Sr,,.(ki(T))})-^E{Sr,,y(ktiT))} 
+{HSr,,.{kl{T))})-^[Sr„y{kl{T))  -  E{S.,,y{kl{T))}] 
-{E{Sr,MMT))})-^[Sr,.4HT))  -  E{Sr„4k,{Tm 
XmSr,MT))}r^E{Sr„y{kt(T))} 

+  Op(A;°  ‘),  (3  -  22) 

where  Op(an)  implies  a  random  sequence  {zn}  such  that 
limn-^oo^nttn^  =  0  in  probability  (i.p.).  Using  the  asymp¬ 
totic  distribution  of  the  estimators  (3-12)  and  (3-13)  on 
the  coarse  grid  (3-14),  [11,  Thm.  P5.2]  and  (3-6),  it  follows 

(as  in  [11,  Thm.  8.8.1])  that  W(e^‘^*)  for  f  =  l,2,-*-,Ar 
are  (asymptoticaDy)  jointly  complex  (circularly  symmet¬ 
ric)  Gaussian  such  that 


4.  A  PSEUDO-MAXIMUM  LIKELIHOOD 
SOLUTION 

As  in  [10]  consider  a  quadratic  transfer  function  matching 
approach.  Let  denote  the  transfer  function  of  (1- 

3)  with  the  system  parameters  9  as  defined  in  (2-1).  We 
choose  9  to  minimize  the  cost 


M 


4rie-,M,nL,nu)  = 


1=1 


where  0  <  Gi,  <  Ai  <  A2  <  •  •  •  <  Am  <  Htr  <  tt. 


Wix^) 

(4-1) 


h(e’^' )  :=  vec{W(e^"' )}.  ;  8)  :=  vec{a(e^"' ;  <»)}, 

||e||^(^^)  =  e"W(u;i)e  and  W(a;z)  is  a  positive-definite, 

Hermitian  weighting  matrix.  This  is  a  nonlinear  iterative 
optimization  problem  which  is  initialized  by  the  consistent, 
closed-form  equation-error  solution  of  Sec.  3.2  of  [10]. 

In  light  of  the  results  of  Sec.  3,  let  us  pick 


W(A()  =  Ai;^E(A,)  =  cov{h(e^'^').h(e^‘*‘)}-  (4-3) 

Then  (4-1)  leads  to  a  (pseudo)  maximum  likelihood  (ML) 
parameter  estimator  (asymptotically).  Since,  in  practice, 
E(Ai)  is  unknown,  we  replace  it  with  its  consistent  estima¬ 
tor  E(Ai)  which  is  obtained  from  (3-18)  by  replacing  all  the 
quantities  therein  with  their  consistent  estimators  such  as 
(3-12)  and  (3-13).  This  leads  to  the  cost 

J2t(9\  Af,  Dj;,ni7) 


(4-4) 

It  follows  [11]  that  under  (AS1)-(AS4),  we  have  limT-^oo 
E(Ai)  =  E(Ai)  w.p.l  uniformly  in  Xi. 

Proof  of  the  following  result  is  omitted. 

Lemma  1,  (A)  Under  (AS1)-(AS4)  such  that 

0  <  Qij  <  Ai  <  A2  <  *  ■  ■  <  Am  <  Her  <  x, 

limT-foo«/2T(^;  AT,  Di,,  Qct^)  «72oo(^;  Af,  Qj^,  Dtr)  uni¬ 

formly  in  9  for  ^  G  0c,  any  compact  set,  where 

J2oo{9\  Af,  Her) 


limT-^00 At  cov  ^vec(?f(e^^*‘)),  vec(7f(e^^*  ))j 

=  E(Afc)^(ib-0  (3-23) 

where 

E(Afc)  :=  (^fc)^  [‘5yy(^fc)  --  ^ytt(^A6)'S'ttu  (Afc)Suy(Afc)] 

^  (3-24) 

and  0  denotes  the  Kronecker  product.  Thus,  7f(e^"*')  on 
the  coarse  grid  (3-14)  is  asymptotically  a  complex  Gaussian 
(in  the  sense  of  [11,  Sec.  4.2])  random  variable,  independent 
at  distinct  frequencies  on  the  coarse  grid  over  (0,  x),  with 
the  covariance  structure  (3-17). 

Remark  1.  In  the  rest  of  the  paper  we  will  use 
Wfc  to  denote  a  frequency  on  the  coarse  grid  (3-14)  with 
A  =  0, 1,  •  •  ■ ,  At  —  1  but  we  wiU  use  Ajt  to  denote  a  fixed 
frequency  independent  of  the  record  length  T.  • 


Af 

=  E  : «)]  ^  S-^(A,)  [h(e^'"> )  -  g(e^'"' ;  (?)] 

1=1 

(4-5) 

and  h(e^^*)  :=  vec{?f(e^^*)}. 

(B)  In  addition,  let  the  set  of  frequencies  above  be¬ 
come  dense  in  the  interval  [D^,  Qt/]  as  Af  — >  00 

where  Ax*s  are  spaced  uniformly  in  this  interval.  Then 

limM.T  -^00  M  ^  J2t(^;  Af,  Gz,,  Du-)  =  J2oo(^;  00,  ni;,nc7) 

uniformly  in  ^  for  ^  G  0c  where 

J2^{e-,^,UL,Qu)  =  -  r''[h(en-g(e^“;«)]’" 

X  E“^(a;)  [^(e^’*^)  -  g(e^‘";  9)]  du;  .  (4  -  6) 
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Define 


5.  CONVERGENCE  ANALYSIS 


^TM  —  B.Tg{lDmeJ2T{0\M,QL,^u)}  i  (5—1) 

=  arg{min0J2oo(^;M,  nijOtr)}  (5-2) 

=  arg  {min0J2oo(^;  oo,  Hi,,  Her)}  .  (5  —  3) 


Using  Lemma  1  and  some  standard  arguments  we  can  es¬ 
tablish  Theorem  1. 

Theorem  1 .  Under  the  hypotheses  of  Lemma  1 ,  it  follows 
that 

:=U  1  J2co(fl;  M,  Ql,  fiir)  =  J2oo(^M\  M,  Ql,  nu)  } 

'  (5-4) 

and  ^  ^ 

limM-^ooliniT-^-oo^Af  €  V^^\QLi^u) 

:=  io  I  J2oo(^;  oo,  ill,,  Her)  =  J2oo(^  ^  oo,  fix,,  Du)  j*  • 

^  '  (5-5) 

Proof:  Mimic  the  proof  of  Theorems  1  and  3  in  [5]  using 
Lemma  1.  Note  that  the  convergence  to  the  set  is  to 

be  interpreted  in  the  sense  of  Ljung  [12,  p.  215].  □ 

Next  we  have  the  consistency  result.  It  follows  from 
[10,  Thm.  1],  Theorem  1,  and  the  conditions  required  for 
uniqueness  of  the  full  polynomial  form  of  the  matrix  frac¬ 
tion  description  of  MIMO  transfer  functions  [3,  Sec.  6.3]. 
First  we  need  a  definition. 

Def.  Sufficient  Order  Model  Set;  The  true  model 
'H{q)  is  of  the  type  (1-3)  such  that  the  true  model  orders 
Tiao  and  nw  satisfy  min(na  —  na^^rih  —  ribo)  >0.  • 

Theorem  2.  Under  (AS1)-(AS4),  Ox,  >  0,  Du  <  x  and 
sufficient  order  modeling  such  that  (2p  —  l)7iaH-7ib<2M, 
it  follows  that 


limT-oo^TM 


p(ao) 


where 

\A-\q;e)B{q;9)  =  7f(g) }  . 

If  min(7ia  —  noOjWb —  ’ibo)  =  0,  A(g;0o)  and  R(g;0o)are  co¬ 
prime  and  rank[Anao  •  —  P*  then  is  a  singleton 

equaling  {^o}-  • 

6.  SIMULATION  EXAMPLES 
6.1.  Example  1. 

Referring  to  (l-l)-(l-3),  the  2  x  2  true  system  model  is 
taken  to  be 

W(g)  =  A-\q)B{q)  (6  -  1) 

with  (/2x2  denotes  the  2  x  2  identity  matrix) 

/  X  .  r  -1-5  0  1  _i  .  f  0.7  0  1  -2 

-A(?)  =  +  0  -1.8  J  ®  L  0  0.81  J  ® 

(6-2) 

and 

X  r  1.0  0  1  -1  .  r  0.5  0.8  1  -2  /-  _  ,N 

=  [  1.0  1.0  J  ®  [  -0.9  0.  j  ® 


The  (noise-free)  inputs  are  generated  as 

Ui{t)  =  -O.Bui{t-l)  +  wi{t)  (i=l,2)  (6-4) 

where  {tUi(t)}  is  i.i.d.,  binary  (±1  with  prob.  0.5  each)  with 
{ii/iW}  independent  of  {tt2(t)}.  The  output  measurement 
noise  is  colored  and  its  components  are  given  by 


Vai(t)  =  1.3725i;«(i-l)-0.9608uoi(t-2)+eoi(<)  (*  =  1.2) 

(6  -  5) 

where  {eoi(i)}  ^  sero-mean  Gaussian  with  {eoi(t)} 

independent  of  {eo2(t)}.  The  input  measurement  noise  is 
colored  and  its  components  are  given  by 

Vij{t)  =  1.3725uy(t-l)-0.9608ui2(i-2)+eij(t)  (i  =  l,2l 

(6  -  6) 

where  {eii(t)}  is  i.i.d.  zero-mean  Gaussian  with  {eii(t)} 
independent  of  {ei2(t)}-  The  input  noise  Vi(t)  is  in¬ 
dependent  of  Vo(t).  The  two  noise  sequences  are 
scaled  to  achieve  the  desired  SNR  at  the  output  (= 

•E{iis(*)ir}/-®{iK<>(^)ir>  ==  'W(7)"(*) ) 

the  input  (=  .B{llii(*)||*}/-E{llvt(t)j|*})- 

Given  the  AR  coefficient  estimates  in  the  I  th 
Monte  Carlo  run,  define  normalized  mean-square  error 
(NMSE)  in  estimating  the  AR  coefficients  averaged  over 
Me  runs  as 


NMSEa  = 


Y"?‘.  IIAilP 


Similarly  NMSE  in  estimating  the  MA  coefficients  is  de- 
fined  as 


NMSEb  = 


Er=i  ll^^ll 


d(0ii2 


The  simulation  results  based  on  100  Monte  Carlo  runs  for 
two  different  record  lengths  and  SNR  =  10  dB  (at  both 
input  as  well  as  output)  are  shown  in  Table  I  for  the  ap¬ 
proaches  proposed  in  [10,  Sec  3.2]  and  Sec.  4  under  the 
sufficient- order  case  with  tia  —  TiaO  —  2  and  nb  —  —  2 

(recall  the  definitions  of  Sec.  4).  In  applying  the  proposed 
approaches,  we  selected  2mT  +  1  =  11  and  45  for  record 
lengths  of  T  =  1024  and  4096,  respectively.  The  num¬ 
ber  of  frequency  points  M  was  taken  to  be  all  the  points 
on  the  coarse  grid  (3-14)  that  He  in  (0,ir).  For  compari¬ 
son  we  also  show  the  results  obtained  using  the  classical 
least-squares  algorithm  ([3,  Sec.  7.1])  and  the  output  er¬ 
ror  method  [3]  ,[12].  For  the  output  error  method  (which 
requires  nonUnear  optimization)  the  initial  guess  was  se¬ 
lected  as 


■A(3) 

and 


B{q)  =  /2x2  + 


0.  0.  ]  ^-2 

0.  0.  ®  * 


(6-8) 


Comparing  (6-7)-(6-8)  with  (6-2)-(6-3)  notice  that  the  ini¬ 
tial  guess  is  not  too  far  from  the  true  values. 

It  is  seen  from  Table  I  that  the  proposed  approaches  per¬ 
form  quite  well.  Use  of  the  statistics  of  the  transfer  function 
estimates  does  improve  the  accuracy  of  the  parameter  es¬ 
timates.  Both  of  the  proposed  approaches  perform  better 
than  the  output  error  method. 
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6.2.  Example  2. 

This  example  is  the  same  as  Example  1  except  for  the  input 
and  output  noise  sequences  which  are  now  dependent.  The 
output  measurement  noise  Vo(t)  is  given  by 

v°W  =  -0.9v„(<-l)+ [  ]  (6-9) 

where  {e(n(t)}  is  i.i.d.  zero-mean  Gaussian  with  {eoi(t)}  in¬ 
dependent  of  {eo2(t)}.  The  input  measurement  noise  Vt(t) 
is  given  by 

Vi(t)  =  0.8vi(t  -  1) -h  Vo(t).  (6-10) 

The  simulation  results  based  on  100  Monte  Carlo  runs 
for  a  record  length  of  4096  and  SNR  =  10  dB  (at  both 
input  as  well  as  output)  is  shown  in  Table  II  under  the 
sufficient-order  case.  In  applying  the  proposed  approaches, 
we  selected  2mT  + 1  =  45.  The  number  of  frequency  points 
M  was  taken  to  be  all  the  points  on  the  coarse  grid  (3- 
14)  that  lie  in  (0, 7r/2).  It  is  seen  from  Table  II  that  the 
proposed  approaches  are  vastly  superior  to  the  output  error 
method. 

7.  CONCLUSIONS 

The  parametric  frequency  domain  approaches  presented  in 

[9]  for  estimation  of  the  parameters  of  single-input  single- 
output,  linear  errors-in- variables  models  given  time-domain 
noisy  input-output  data,  were  extended  to  multivariable 
models  in  [10].  Attention  was  focused  on  frequency-domain 
approaches  where  the  integrated  polyspectrum  (bispec¬ 
trum  or  trispectrum)  of  the  input  and  the  integrated 
cross-polyspectrum,  respectively,  of  the  given  time-domain 
input-output  data  are  exploited.  In  [10]  identifiability  and 
consistency  issues  for  MIMO  systems  were  considered.  No 
simulation  examples  were  provided  in  [10].  In  this  paper  we 
first  developed  expressions  for  the  covariance  matrix  of  the 
system  transfer  function  estimate  (which  is  based  upon  cer¬ 
tain  integrated  polyspectrum  estimates)  and  showed  that 
the  system  transfer  function  matrix  estimate  is  asymptot¬ 
ically  complex  Gaussian.  Then  we  proposed  and  analyzed 
a  pseudo-maximum  likelihood  (PML)  estimator  of  system 
parameters  using  the  developed  statistics  of  the  system 
transfer  function  estimate.  Finally  two  simulation  exam¬ 
ples  were  provided  to  illustrate  two  approaches:  the  pro¬ 
posed  PML  and  the  equation-error  approach  of  [10]. 

8.  REFERENCES 

[1]  M.  Deistler,  “Symmetric  modeling  in  system  iden¬ 
tification,”  in  H.  Nijmeijer  and  J.M.  Schumacher 
(Eds.),  Three  Decades  of  Mathematical  System  The- 
ory.  Lecture  Notes  in  Control  &  Information  Sciences, 
Springer,  1989. 

[2]  R.E.  Kalman,  “Identifiability  and  modeling  in  econo¬ 
metrics,”  in  Development  in  Statistics,  IV,  P.R.  Kr- 
ishnaiah  (Ed.),  Academic  Press,  1983. 

[3]  T.  Soderstrom  and  P.  Stoica,  System  Identification. 
Prentice  Hall  Intern.:  London,  1989. 

[4]  P.  Stoica  and  A.  Nehorai,  “On  the  uniqueness  of  pre¬ 
diction  error  models  for  systems  with  noisy  input- 
output  data,”  Automatica,  vol.  23,  pp.  541-543,  1987. 

[5]  J.K.  '^gnait,  “Stochastic  system  identification  with 
noisy  input  using  cumulant  statistics,”  IEEE  Transac¬ 
tions  on  Automatic  Control  vol.  AC-37,  pp.  476-485 
Aprfi  1992. 


[6]  Y.  Inouye  and  H.  Tsuchiya,  “Identification  of  linear 
systems  using  input-output  cumulants,”  Intern.  J. 
Control,  vol.  53,  pp.  1431-1448,  1991. 

[7]  A.  Delopoulos  and  G.B.  Giannakis,  “Consistent  iden¬ 
tification  of  stochastic  linear  systems  with  noisy  input- 
output  data,”  Automatica,  vol.  30  ,  Aug.  1994. 

[8]  Y.  Inouye  and  Y.  Suga,  “Identification  of  linear  sys¬ 
tems  with  noisy  input  using  input-output  cumulants,” 
Intern.  J.  Control,  vol.  59,  pp.  1231-1253,  May  1994. 

[9]  J.K.  Tugnait  and  Y.  Ye,  “Stochastic  system  identi¬ 
fication  with  noisy  input-output  measurements  using 
polyspectra,”  IEEE  Trans.  Automatic  Control,  vol. 
AC-40,  pp.  670-683,  April  1995. 

[10]  J.K.  Tugnait,  “Identifiability  of  multivariable  errors- 
in-variables  models  using  integrated  polyspectrum,” 
in  Proc.  IEEE  Signal  Proc./ATHOS  Workshop  on 
Higher-Order  Statistics,  pp.  39-43,  June  12-14,  1995, 
Begur,  Spain. 

[11]  D.R.  Brillinger,  Time  Series  Data  Analysis  and  The¬ 
ory.  New  York:  Holt,  Rinehart  and  Winston,  1975. 

[12]  L.  Ljung,  System  Identification:  Theory  for  the  User. 
Prentice-Hall:  Englewood  Cliffs,  N.J.,  1987. 


Table  I.  Example  1:  Normalized  mean-square  errors 
(NMSE)  (in  dB)  in  estimating  the  AR  and  MA 
coefficients:  SNR  =  10  dB,  averages  over  100  Monte  Carlo 
runs. 

OEM  -  output  error  method  (time-domain), 

EE  -  Eqn.-error  [10,  Sec.  3.2] 


Approach 

NMSE 

T=1024 

(dB) 

T=4096 

NMSE 

T=1024 

B  (dB) 

T=4096 

EE 

-13.45 

-21.39 

-10.66 

-18.35 

Sec.  4 

-19.48 

-36.52 

-11.78 

-25.35 

OEM 

-17.54 

-18.02 

-10.72 

-12.73 

Table  II:  Example  2:  Normalized  mean-square  errors 
(NMSE)  (in  dB)  in  estimating  the  AR  and  MA 
coefficients:  SNR  =  10  dB,  averages  over  100  Monte  Carlo 
runs. 

OEM  “  output  error  method  (time-domain), 

EE  -  Eqn.-error  [10,  Sec.  3.2] 


Approach 

NMSEa  (dB) 

NMSEb  (dB) 

T=4096 

T=4096 

EE 

-29.78 

-16.44 

Sec.  4 

-39.73 

-22.81 

OEM 

-3.91 

4.70 
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Abstract 

In  this  contribution,  a  generalization  of  Super  Exponen¬ 
tial  blind  deconvolution  method  is  discussed.  The  general¬ 
ization  consists  in  the  definition  of  a  "spikyness  criterion 
involving  nonlinearities  rather  than  only  powers.  This  al¬ 
lows  to  rephrase  Bussgang  deconvolution  in  the  framework 
of  Super  Exponential  deconvolution  using  a  spikyness  cri¬ 
terion  which  takes  into  account  to  the  pdf  of  the  input  series 
to  be  recovered.  Improved  performance  are  expected  when 
generalized  Super  Exponential  deconvolution  is  tuned  to 
suitable  optimality  criteria. 

I  Introduction 

Among  blind  deconvolution  techniques,  Bussgang  de- 
convolution  [1,2]  and  Super  Exponential  methods  [3]  have 
received  great  interest  due  to  their  accuracy  and  their  sim¬ 
plicity  of  formulation  and  implementation. 

While  a  certain  degree  of  sub-optimality  can  be  advoked 
in  Bussgang  deconvolution.  Super  Exponential  methods 
do  not  rely  on  any  statistical  optimality  criterion;  these 
latter  methods  are  rather  formulated  considering  an  iter¬ 
ative  nonlinear  algorithm  able  to  drive  the  overall  chan- 
nel+equalizer  impulse  response  toward  a  (possibly  scaled 
and  time-shifted)  unit  sample. 

Interestingly  enough,  both  the  methods,  by  exploiting 
the  non-Gaussianity  of  the  unknown  i.i.d.  signal  to  be 
recovered,  result  in  iterative  algorithms  which  require  the 
solution  of  a  linear  set  of  equations  involving  higher-order 
moments  and/or  cumulants  of  the  signals  at  hand. 

In  this  contribution,  we  establish  a  link  between  the 
two  blind  deconvolution  methods  by  showing  that  Buss¬ 
gang  deconvolution  can  be  obtained  as  a  generalization  of 
Super  Exponential  methods  explicitely  taking  into  account 
the  probabilistic  description  of  the  signal  to  be  recovered. 

Recalling  the  blind  deconvolution  method  described  in 
[4,  5,  6]  where  it  has  been  also  showed  that  Bussgang 
deconvolution  is  an  approximate  implementation  of  optimal 
(in  a  MMSE  sense)  blind  deconvolution,  we  also  conclude 
that  Super  Exponential  methods  result  in  optimal  MMSE 
solutions  when  the  generalized  form  is  considered.  In  this 
case,  all  the  higher-order  cumulants  are  implicitely  used 


through  expectation  of  a  suitable  nonlinearity  depending 
on  the  pdf  of  the  unknown  signal. 

Finally,  we  note  that  these  estimation  methods,  based  on 
non-Gaussianity  measured  through  cumulants  or  nonlinear 
moments,  are  implemented  in  simple  iterative  schemes  con¬ 
stituting  the  nonlinear  minimization  of  suitable  cost  func¬ 
tions.  The  simplicity  of  the  iterative  scheme  allows  to 
control  the  solution  at  the  generic  iteration  step,  giving 
a  concrete  possibility  to  avoid  solution  corresponding  to 
local  minima/maxima  (due  to  numerical  bad-conditioning 
induced  by  sample  statistics)  which  is  a  difficult  task  to  per¬ 
form  when  the  nonlinear  optimization  is  implemented  using 
(closed)  general  purposes  optimization  software  packages. 
For  the  same  reason,  a  statistical  analysis  can  be  reasonably 
performed  [3,  10]  despite  the  analytical  complications  aris¬ 
ing  when  (sample)  higher-order  cumulants  are  considered. 

II  Generalized  Super  Exponential 
Deconvolution 

The  key  point  of  this  contribution  is  constituted  by  the 
generalization  of  Super  Exponential  methods  for  blind  de- 
convolution  in  order  to  obtain  the  solving  equations  in  a 
form  comparable  to  those  obtained  in  Bussgang  deconvo¬ 
lution.  To  this  end,  let  us  recall  the  basics  of  Super  Expo¬ 
nential  methods.  With  reference  to  Fig.l,  let  us  denote  by 


c[n] 


Figure  1:  Blind  deconvolution. 

/[n]  the  impulse  response  of  the  equalizer  of  the  linear  and 
shift-invariant  (LSI)  transformation  h[n]  which  has  linearly 
distorted  the  i.i.d.  series  s[n];  this  latter  has  to  be  recov¬ 
ered  from  the  noise-free  observed  series  x[n].  Moreover, 
let  us  denote  by  c[n]  =  h[n]*  f[n]  the  impulse  response  of 
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the  overall  LSI  transformation  acting  on  s[n]. 

Any  deconvolution  method  attempts  to  synthetize  f[n] 
in  order  to  drive  c[n]  as  close  as  possible  to  a  •  6[n  —  d], 
i.e.  to  realize  an  approximate  ideal  channel  apart  a  gain 
a  and  a  delay  d.  Super  Exponential  methods  iteratively 
determine  gi[n]  at  iteration  i  in  such  a  way  that  the  overall 
response  Ci[n]  =  /i[n]*/i[n]  is  more  “spiky”  than  the  previ¬ 
ous  iteration.  The  spikyness  is  obtained  through  elevation 
to  the  integer  power  p  and  normalization^ 

Ci[n]  =  ■  {£  {(ci_i[n]f })~"  (1) 

where  5  {•}  extracts  the  energy  of  the  sequence  in  its  ar¬ 
gument.  The  normalization  in  (1)  induces  the  spiking  be¬ 
haviour  of  the  overall  LSI  Ci[n]  transformation  which  natu¬ 
rally  tends  to  8[n  —  d]  as  iterations  proceed.^  The  resulting 
LMS  equalizer  fi[n]  at  iteration  i  is  obtained  from  the  fol¬ 
lowing  set  of  linear  equation  (see  [3]) 


Rx 

■n= 

D+1 

(2) 

fi= 

(3) 

where  and 

are  the  second-order 

cumu- 

lant  (variance) 

and 

the  p-order  cumulant 

of  the 

input  signal  s[n],  respectively,  ||fi||m  =  /i[^^]  is 
a  vector  which  collects  the  equalizer  coefficients, 

\\I(^\\km  =  Rx[k  —  rn]^E{x[k]^x[rn]}  is  the  covari¬ 
ance  matrix  of  the  observed  series  x[n],  and  the  vector 
=  collects  the  diagonal  slice  of 

cross-cumulants  of  order  (p,  1) 


p 


being  5i-i[n]  =  /i-i[n]  *  x[n]  the  estimation  of  the  un¬ 
known  series  s[n]  at  iteration  («  — 1). 

Note  that,  the  right-hand  side  of  (2)  involves  the  cumu- 
lants  of  order  (p  -f  1)  due  to  the  choice  of  the  power  of 
order  p  in  the  spiking  criterion  (1). 

The  generic  cross-cumulant  of  order  (p,  1)  can  also  be 
obtained  as  the  following  nonlinear  “hybrid”  moment  [7] 

N  =  E  {x[n  -  to]  ■  7p'  (Si_i  [n])}  (4) 

where  jp  (s)  is  a  suitable  polynomial  of  order  p,  with  co¬ 
efficients  depending  on  the  moments  (up  to  the  order  p) 
of  the  random  variable  s.  For  instance,  using  72(5)  = 
yields  the  third-order  slice  cross-cumulants  for  zero-mean 

^For  notation  simplicity  and  for  readibility  purposes,  in  this  paper  we 
refer  to  die  case  of  real  sequences  and  LSI  transformations.  The  complex 
case  does  not  involve  any  special  consideration  and  it  can  be  obtained  by 
properly  defining  higher-order  cumulants. 

^Apart  pathological  situations  of  multimodal  c[n]  having  extremal  val> 
ues  numerically  equal. 


series  while  73(5)  =  5^  —  3  •  •  5  obtains  the  fourth-order 

cross-cumulants  for  zero-mean  series  with  variance  cr^. 

Question:  What  happens  if  we  consider  general  non- 
linearities  p(»)  in  (4)  and  use  these  nonlinear  moments  in 
the  solving  equations  (2)  rather  than  the  (p  -h  l)-order 
cumulants? 

The  answer  is:  In  the  spiking  equation  (1)  another  non¬ 
linearity  7){')  (depending  on  g{')  but  not  the  same!)  is  con¬ 
sidered  in  lieu  of  the  power  of  order  p! 

This  result  generalizes  the  Super  Exponential  methods 
including  all  those  nonlinearities  p(  )  resulting  in  a  spiking 
action  on  the  overall  transformation  c[n].  The  proof  of  the 
previous  claim  relies  on  the  representation  of  the  nonlinear 
moments  E{x[n  — m]  •p(x[n])}  by  the  cumulant  series 
decomposition  introduced  in  [8,  9].  In  fact,  let  us  consider 
a  nonlinearity  g  (si_i[n])  instead  of  7^  (si_i[n])  in  (4) 

r-gxH  E{a;[n  -m]-g  (si_i[n])}  (5) 


and  express  it  through  a  cumulant  series  expansion  [8,  9] 

E{x[n-TO]-5(si_i[n])}  = 

f;iE{5(")(s,_i[n])}K(P:\UTO] 

P=l^‘ 

having  denoted  by  the  p-order  derivative  of  p^-)- 

Now,  let  us  consider  the  generalized  problem  of  deter¬ 
mining  an  equalizer  fi[n]  such  that  the  overall  response 
Ci[n]  is  “spiked”  through  a  nonlinearity  7]{')  rather  than  a 
power;  expanding  p(  )  in  Taylor  series,  we  have: 

oo 

Ci[n]=r]{ci-i[n])  =  y2 (^i- 1 W (7) 

p=l 

It  is  straightforwardly  shown  that  the  LMS  equalizer  is 
obtained  from  the  following  set  of  “generalized”  normal 
equations 


oo 

^  OLp 

P=1 


K 


Si—iX 


(8) 


Note  that  (8)  has  the  same  structure  of  (2);  the  only  dif¬ 
ference  is  that  the  right-hand  side  of  (8)  involves  all  the 
higher-order  cross-cumulants  weighed  by  the  coef¬ 

ficients  of  the  Taylor  expansion  of  the  spiking  nonlinearity 
7?(-)  in  (7). 

Finally,  we  see  that  the  right-hand  side  of  (8)  coincides 
with  the  nonlinear  cross-correlations  of  (6)  when  it  results 

In  other  words,  the  nonlinear  “spiking”  criterion 

=  E ;;?  ■  -VE  (5i_i[n])}  .  (c,_i[n]r 
p=l  P'  ^ 
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is  implicitely  adopted  when  the  nonlinear  cross¬ 
correlations  (6)  are  considered  in  the  right-hand  side  of 
the  normal  equations  (8)  which  obtain  the  equalizer.  Note 
also  that,  advoking  again  the  cumulant  series  expansion,  the 
spiking  criterion  (7)  can  also  be  written  in  the  following 
form 

Ci[n]  =  7?(ci_i[n])  =  E{s[n]  •g(si_i[n])} 

where  the  dependence  of  Cj_i[n]  is  hidden  in  the  nonlin¬ 
earity  p  (si_i[n]). 

In  summary,  the  nonlinearity  7?(-)  acting  on  the  overall 
LSI  transformation  Cj[n]  takes  into  account  all  the  higher- 
order  cumulants  of  the  non— Gaussian  input  s[ti],  weigh¬ 
ing  them  by  expectations  of  derivatives  of  the  nonlinear¬ 
ity  g{-).  Moreover,  it  should  be  expected  a  perfomance 
improvement  if  the  weights  are  suitable  chosen,  i.e.  the 
nonlinearity  g{-)  is  obtained  according  to  some  optimality 
criterion.  For  instance,  in  [3]  performance  analysis  of  the 
Super  Exponential  deconvolution  shows  that  sample  fourth 
order  cumulants  suffice  to  obtain  zero  ISI  at  convergence 
when  constant  modulus  constellations  are  considered  as  in¬ 
put.  In  [10],  performance  analysis  of  the  generalized  Su¬ 
per  Exponential  deconvolution  herewith  presented  shows 
that,  for  (odd)  polynomial  nonlinearities  of  seventh  order 
g{s)=a-s+b-s^+c-s^+d-s'^,  there  exist  optimal  values 
of  coefficients  (o,  b,  c,  d)  that  result  in  zero  ISI  also  when 
non  constant  modulus  constellations  are  considered. 

Ill  Bussgang  Deconvolution 

As  above  outlined,  among  the  nonlinearities  g(-)  opti¬ 
mal  choices  could  be  made  according  to  suitable  optimality 
criteria.  To  investigate  in  this  direction,  let  us  briefly  recall 
Bussgang  deconvolution  algorithm  [1,  2].  The  equalizer  f; 
is  obtained  (at  iteration  i)  solving  the  classical  linear  set  of 
normal  equations  (Wiener  filter)  and  normalizing: 

.  f'  =  rs,x  (9) 

L  =  f;.(fr-Rx  (10) 

where  the  vector 

||rj,^IU  =  RsAm]  =  E{x[n  -  m]  ■  Si[n]} 

collects  the  cross-correlation  lags  between  the  observations 
x[n]  and  an  estimate  of  the  unknown  series  Si[n]. 

The  key  idea  in  Bussgang  deconvolution  is  to 
obtain  a  sample  estimate  of  the  cross-correlations 
E  {Si[n]  -xln-  m]}  using  a  MMSE  estimate  of  s[n]  drawn 
from  the  following  signal-in-noise  model  for  the  Wiener 
estimate  at  iteration  (^  —  1) 

z[n]  Si-i[n]  =  fi[n]  *  x[ri\  =  s[n]  +  Wi[n]  (11) 
In  (11),  the  “deconvolutional”  noise 

Wi[n]  {fi[n]  -  /i[0]<5[n])  *  x[n] 


is  assumed  Gaussian,  white  and  independent  of  s[n].  In 
[1,  2],  it  is  claimed  that  this  assumption  can  be  reason¬ 
ably  substained  at  convergence,  Le.  when  Wi[n]  becomes 
“long  and  oscillatory”.^In  these  hyphoteses,  the  MMSE  es¬ 
timation  of  s[n],  given  z[n]  in  (11),  is  the  conditional  a 
posteriori  mean  and  does  not  depend  on  the  time  index  n. 
Dropping  it  for  simplicity,  we  have 

Si{z)  =  j  z  ■  Ps,/z{z)  dz  (12) 

where  Psiizi')  is  the  conditional  a  posteriori  pdf  of 
Si[n]  given  z[n].  Note  that  if  the  deconvolutional  noise 
Wi[n]  is  white  and  independent  of  s[n],  the  conditional 
mean  is  simply  a  nonlinearity  acting  sample-by-sample 

on  z[n]^^ Si--i[n],  Le.  the  zero-memory  MMSE  estima¬ 
tor  takes  the  form  s[n]=g  [n])  where  the  nonlinearity 
g{')  depends  on  the  pdf  of  s[n]  and  Wi[n].  Note  that  the 
shape  of  the  conditional  estimator  (12)  changes  at  each 
iteration  since  it  actually  depends  on  the  variance  of  the 
Gaussian  “deconvolutional”  noise  Wi[n]  ,  which,  in  turn, 
diminishes  as  iterations  proceed.  This  estimate  is  used  to 
form  the  right-hand  side  of  (9)  through  the  nonlinear  cross- 
correlations'^ 

^E{g  {si-i[n])  •  x[n  -  m]}  (13) 

Now,  look  at  the  solving  equations  of  the  two  blind  decon¬ 
volution  methods  (2)  and  (9):  they  are  identical  in  struc¬ 
ture,  the  only  difference  being  constituted  by  the  right-hand 
sides.  The  generic  elements  of  the  right-hand  sides  are  ex¬ 
pressed  by  (4)  and  (13),  respectively:  they  are  nonlinear  ex¬ 
pectations,  the  only  difference  being  the  nonlinearity,  cho¬ 
sen  to  obtain  the  cross^umulants  (corresponding  to  spik¬ 
ing  through  powers)  in  (4)  and  according  to  the  conditional 
expectation  (extraction  of  signal  in  white  deconvolutional 
noise)  in  (13). 

In  the  framework  of  the  above  described  generalization 
of  Super  Exponential  methods,  we  conclude  that  Bussgang 
deconvolution  obtain  the  equalizer  by  “spiking”  the  overall 
impulse  response  c[n]  using  all  the  available  probabilistic 
description  in  the  conditional  pdf  PSifzi')- 
For  instance,  in  Fig.2  the  implicit  “spikyness”  criterion 
is  plotted  in  the  case  of  a  binary  signaling  communi¬ 
cations  scheme,  Le.  considering  an  i.i.d.  input  series 
s[n]  assuming  the  values  ±0.5  with  identical  probabili¬ 
ties.  The  curves  in  Fig.2  have  been  obtained  by  approx- 

^The  assumption  of  Gaussianity  of  the  deconvolutional  noise  Wi[n] 
has  been  discussed  also  in  [11]  where  it  has  been  shown  that  Gaussianity 
cannot  be  substained  even  at  converence.  Since  the  estimator  (12)  does 
not  take  properly  into  account  the  pdf  of  the  convolutional  noise,  it  has 
to  be  considered  as  a  suboptimal  estimate;  it  is  adopted  mainly  for  its 
simplicity. 

"^The  Bussgang  denomination  is  due  to  the  fact  that  the  equal¬ 
izer  satisfies  U  ft_i  at  convergence  and  this  in  turn  implies  the 
invariance  of  the  cross-correlation  under  a  nonlinear  transformation 
E{x[n-m]  •  Si[n]}  (xE{g(si[n]  •x[n-m])}. 


246 


Figure  2:  The  implicit  "spikyness"  criterion  in  Buss- 
gang  deconvolution  of  a  binary  signaling  communi¬ 
cations  scheme. 


imating  (in  an  uniformly  MMSE  sense)  the  nonlinear  es¬ 
timator  (12)  as  a  fifth  order  polynomial.  The  deconvo- 
lutional  noise  w[n]  has  been  assumed  white,  zero-mean, 
Gaussian  with  variance  (r)i/;^  =  0.1,  Le.  dB  below 
the  binary  signal  power.  The  curve  labelled  “Initial”  in 
Fig.2  refers  to  the  case  of  uniformly  distributed  observa¬ 
tions  z[n]  over  (—1,1),  Le.  to  the  initial  iterations  of 
the  algorithm;  instead,  the  curve  labelled  “Final”  refers 
to  the  case  of  binary  (perfectly  equalized)  observations 
z[n],  Le.  near  the  convergence  of  the  algorithm.  For 
comparison  purposes,  also  the  “Cube”  curve  is  reported, 
corresponding  to  the  spiking  through  fourth-order  cumu- 
lants.  We  see  that  the  spiking  in  Bussgang  deconvolution 
is  “adapted”  to  the  state  of  the  equalized  data:  initially, 
high  values  are  enhanced  without  suppressing  small  values 
(this  reflects  some  initial  uncertainly  about  the  true  posi¬ 
tion  of  the  spike),  while,  near  the  covergence,  small  values 
are  shrinked  without  enhancing  the  already  adjusted  spike 
(which  is  assuming  value  On  the  other  hand,  the  cu¬ 
bic  law  acts  on  high  and  small  values  always  in  the  same 
fashion  during  the  iterations. 

IV  Optimality  of  Bussgang  Deconvolution 

Despite  its  simple  formulation,  Bussgang  Deconvolution 
has  been  shown  to  be  an  approximate  algorithmic  imple¬ 
mentation  of  an  optimal  Bayesian  solution  derived  in  [4] 
and  reported  also  in  [5,  6].  Loosely  speaking,  the  normal 
equations  (9)  and  the  conditional  mean  (12)  results  from 
the  differentiation  of  the  quadratic  cost  function  w.r.t.  the 
unknown  equalizer  paremeters  and  w.r.t.  the  unobserved 
input  series  s[n],  respectively. 

This  confirms  the  fact  that  nonlinearities  can  be  opti¬ 


mally  chosen  in  the  generalized  Super  Exponential  blind 
deconvolution.  Bussgang  deconvolution  offers  the  possi¬ 
bility  of  controlling  the  “spikyness”  as  iterations  proceed 
by  adapting  the  estimation  to  the  amount  of  “deconvolu- 
tional”  noise  not  yet  removed;  on  the  other  hand,  optimal 
“fixed”  parameters  polynomial  nonlinearities  can  be  opti¬ 
mally  designed  to  drive  ISI  to  zero,  as  performance  analysis 
conducted  in  [3,  10]  have  been  shown. 

V  Conclusion 

A  generalization  of  Super  Exponential  blind  deconvolu¬ 
tion  method  has  been  presented.  The  inherent  “spikyness” 
criterion  is  allowed  to  include  general  nonlinearities  rather 
than  only  powers  as  in  the  original  formulation. 

This  generalization  allows  to  interpretate  Bussgang  de- 
convolution  as  a  particular  instance  of  generalized  Super 
Exponential  deconvolution  obtained  by  selecting  the  spiky¬ 
ness  criterion  according  to  the  known  pdf  of  the  input  series 
to  be  recovered. 

The  usefulness  of  such  an  approach  is  confirmed  by 
performance  analysis  which  shows  that  zero  ISI  can  be 
obtained  also  when  non  constant  modulus  constellations 
are  considered. 
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Abstract 

A  recursive  estimation  algorithm  for  FIR  systems  is 
proposed  using  the  3rd  and  4th  order  cumulants.  From  the 
3rd  and  4th  order  cumulants  relationship,  we  construct  a 
certain  matrix  form  whose  entry  is  consists  of  the  system 
output  sequence.  Using  this  matrix  form,  the  proposed 
recursive  algorithm  is  developed  by  Overdetermined 
Recursive  Instrumental  Variable(ORIV)  method.  The 
proposed  algorithm  provides  improved  estimation 
accuracy  when  additive  Gaussian  noise  is  present  and  can 
be  applied  to  a  time  varying  system  as  well.  Simulation 
results  are  presented  to  compare  the  performance  with 
other  HOS-based  algorithms. 


1.  Introduction 

Recently,  new  identification  algorithms  using  higher 
order  cumulants  have  appeared  in  the  literature:  GM- 
method  [1]  and  Adaptive  HR  algorithm  based  on  GM- 
equation  [2],  etc.-  The  GM-method  and  its  modified 
algorithms  show  erratic  behaviors  when  the  output  data 
are  corrupted  by  additive  Gaussian  noise  since  they 
include  the  correlation  terms.  The  new  estimation 
algorithm  using  the  3rd  and  4th  order  cumulants  [3]  has 
been  proposed  to  avoid  the  above-mentioned  erratic 
behaviors,  alleviating  the  effect  of  additive  white/colored 
Gaussian  noise.  This  algorithm  uses  the  3rd  and  4th  order 
cumulants  but  not  the  correlation  terms.  As  a  result,  it  is 
blind  to  additive  white/colored  Gaussian  noise  even  if  the 
SNR  is  low. 

In  general,  higher  than  second  order  cumulants  are 
commonly  of  high  variance  and  therefore  a  large  number 
of  record  is  required  to  obtain  reliable  estimates  of  higher 
order  cumulants.  Compared  with  the  batch  form 
algorithms,  recursive  algorithms  have  some  merits,  i.e., 
tracking  capability  of  a  time  varying  system  and  less  strict 
requirement  on  the  number  of  data.  Hence,  we  propose  a 
new  recursive  algorithm  for  an  FIR  system  as  an 


extension  of  the  algorithm  in  [3].  At  first,  we  transform 
the  relationship  between  3rd  and  4th  cumulant  into  the 
matrix  form  as  a  least  square  solution.  This  matrix  is 
consists  of  3rd  and  4th  order  cumulants.  To  derive  the 
recursive  estimation  algorithm  using  this  matrix  form,  we 
reformulate  the  component  of  matrix  whose  entries  are 
consist  of  output  sequence  by  introducing  the 
instrumental  variable.  Using  reformulated  matrix,  we  get 
the  time  recursive  estimation  algorithm  by  following  the 
ORIV  derivation  procedure. 

The  outline  of  the  paper  is  as  follows:  In  Section  II,  we 
introduce  the  GM  equation  and  the  relationship  between 
3rd  and  4th  order  cumulants.  And,  the  proposed  recursive 
algorithm  based  on  3rd  and  4th  order  cumulants  is 
presented.  In  Section  III,  we  present  a  set  of  numerical 
examples  illustrating  the  behavior  of  the  proposed 
algorithm  with  comparison  to  the  other  algorithm,  and 
show  the  performance  of  tracking  capability  of  a  time 
varying  system.. 

II.  Proposed  Algorithm 

Let  an  observed  process  be 

yin)  =  x(n)  +  w(n)  =  ^l>(iX»-0+Mn)  (1) 

/=() 

where  x(n)  is  the  system  output,  {b{n)}  is  the  impulse 
response  of  an  unknown  system,  and  {v(n)}  is  i.i.d.  non- 
Gaussian  and  has  an  asymmetric  p.d.f  with  £{v(n)}  =  0, 
=  E{v\n)}  *  0 ,  y,..  =  -  3[£{v'(«)}]^  0  .  The 

additive  noise  {w(m)}  is  Gaussian  and  independent  of 
{v(«)}  • 

The  GM  equation[l]  is  show  the  relationship  between 
correlation  and  cumulant  as  follows. 

C2j,im)  +  5^6^  (i)c2A"*  - ')  = 

i 
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^2,3 


,  for  ^  m  <  2q  (2) 


where  Cj  „(m)  =  Cj^^(m,m)  =  E{y(n)y(n  +  my} ,  is 

correlation  term  of  output  sequence  with  m  lags,  and 
^2,3  =  yi.v/rxv  ■  The  GM  equation  based  algorithms  are 
performed  well  for  system  identification  even  if  the 
system  is  nonminimum  phase.  However,  they  fail  to 
estimates  when  the  output  sequence  is  corrupted  by 
additive  noise  due  to  bias  of  correlation  term  in  sprite  that 
the  cumulants  is  blinded  to  additive  noise.  In  this 
motivation,  the  relationship  between  3rd  and  4th  order 
cumulants  is  proposed  as  was  shown  in  [3], 


+  -  i)  = 

i=l 

J'c4  ,(w)  +  Y,b\i)c^y(m-i) 


,ior  -q<m<2q  (3) 


where,  e  =  „  and  (m)  =  (m,  m,  m)  = 

E{y(,n)y{n  +  mf }  -  'iE{y{n)y{n  +  m))  €^^(0) .  Eq.  (3)  can  be 
recast  in  a  following  matrix  form  as. 


and 


A  = 


‘^4,y(-<7) 
C4,^(-9  +  1) 


^=[«'3.4  «'3,4*^(1) 

*  =  [g„.(-4)  •••  • 


A-6>=d, 

0  0  0 

:  G,y(-4) 

C4„v(-4)  :  0 

‘^4,y(-9  +  i)  Cjy(q)  Cjy(-q) 

‘^4,y('j)  0  CjJq) 

■  e^.4b\q)  -b\\)  -  -b\q)^, 

G...(9)  0  o]^. 


(4) 


where,  A  is  the  (Siy  + 1)  x  (2^  + 1)  matrix,  and  b  is  the 
(3^  +  1)  vector.  In  general,  we  can  obtain  the  least  squares 

solution  of  (4)  as  0  =  (A^  Ay'  A^b .  To  derive  the  recursive 
estimation  algorithm  from  (4),  we  propose,  instead,  to 
replace  A  and  b  by  the  following  estimates.  Define 


A  =  zl[c,yCy^],  b^zlcy. 


(5) 


z.= 


>(0) 

0 

0 

F(0) 

>  ^3,n 

0 

y\o) 

m 

•••  y(n-3q)_ 

1 

0 

0 

0 

0 

C3,„  = 

/(O) 

0 

> 

/(O) 

/(n-q) 

y\n-2q)^ 

0 

0 

0 

0 

y\o) 

-0^(0) 

:  0 
:  /(0)-«v(0) 

/{n-q)-ay{n-q)  ■■■  y\n-2q)-ay(n-2q) 


where,  a  =  3c2^(0) ,  and  A  and  b  is  estimate  of^  and  b, 
respectively.  Z„  acts  as  instrumental  variable  with 
(«  + 1)  X  (3q  + 1)  matrix,  Cj  „  is  the  vector  with  (n  + 1)  x  1 , 
and  and  is  the  matrix  with  (/?  + 1)  x  (<7  + 1)  and 
{n  +  \)xq,  respectively.  Note  that  the  elements  of 
and  ZjC3  „  are  the  consistent  estimates  of  c^y{m)  and 
j  for  -q^m<q  ,  respectively.  This  is  in  agreement 
with  the  matrix  appearing  in  the  left-hand  side  of  (4). 
Similarly,  is  also  a  consistent  estimate  of  the  right- 
hand  side  of  (4). 

The  estimates  of  the  parameter  vector  is  given  by  the 
least  squares  solution  to  (5); 


e„  =(F/z„z:F„)'‘y;z„z:c3, 


(6) 


where. 


where  =[^^1^3^] .  Following  the  derivation  of 

ORIV[4],  the  time  update  of  parameter  vector  can  be 
separately  involved  as  follows: 


P„  =  (f/Z„Z:F„)-'  ,  =  F/Z„ ,  L„  =  Zlc,^  . 


(7) 
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The  P„  and  S„  matrix  can  be  updated  as  was  described  in 
ORIV  method.  The  most  significant  difference  between 
the  proposed  and  the  algorithm  of  [2]  lies  in  L„  which 

correspond  to  the  b  matrix.  It  can  be  recursively  updated 
by 

L„,^='^L„  +  z„,y(n+\) ,  (8) 

where,  the  0<2,<1  is  an  exponential  forgetting  factor 
which  enables  the  algorithm  to  track  time  varying 
parameter,  and  is  («+l)th  column  vector  of  . 
From  (7),  we  note  that 

^n+l  =  f’n+t‘^/i+l^n+1 

=  P„„(25„  +z„„y^(n+l))  (9) 

From  time  update  of  and  ,  we  obtain  the 
following  time  update  equation  of  parameter  vector 

=  k  (10) 

where,  <!>„,  and  A„,  are  same  as  was  discussed  in  ORIV 
method,  and  =[z';,£„  v'(«+l)]’'-  Finally,  the 

proposed  recursive  algorithm  using  the  3rd  and  4th  order 
cumulants  is  summarized  in  Table  1.  We  refer  the  reader 
to  [4]  for  details. 

In  estimating  the  4th  order  cumulant,  the  estimate  of 
correlation  c^^IO)  is  needed.  The  corresponding  time 
update  formula  would  be 

(0)  =  (0)  +  y\n  + !))/«  +  1  (1 1  •  1) 

or  C2,v,„,.,(0)  =  ^c2y,„(0)  +  (l-^)y^(/i+l)  (11-2) 

Eq.  (11.1)  and  (11.2)  can  be  used  for  time  invariant 
system  case  and  time  variant  system  case,  respectively. 

III.  Simulation  Results 

The  proposed  algorithm  is  compared  with  the 
recursive  algorithm  using  the  2nd  and  3rd  order 
cumulants  [2],  batch  algorithm  in  [3],  and  the  least  square 
solution  of  (6).  The  input  sequence  {v(«)} ,  which  is  an 
independent  exponentially  distributed  zero-mean  random 
sequence  with  a]  -  1,  -  2 ,  and  /4  ^  =  6 ,  was  generated, 

and  additive  Gaussian  noise  was  added  to  the  observed 
output  sequence.  We  performed  100  Monte  Cairo  runs  for 


Table  1. 

SUMMARY  OF  THE  PROPOSED  ALGORITHM 
Initialization 

So  =  i2g  + 1) X (3q  +  1),  P„  = 

L„=0  (3q  +  \)x\,  (2^  +  l)xl 

For  «  =  0,1,2/ 

Z„^,  =  [y(n  4-  !)•  •  y{n  -3q  + 1)]^ 

C2,y.«+1  •  ^2,y.«  (0)  +  /(«  +  0)  /  («  + 1) 

C2.y.»+l(0)  =  Or’C2.y.»(0)  +  0+^3r)/(rt  +  l) 
x„^,=^{y\n-q  +  !)■  •  -y  («  -  2^  +  l)\y{n  -  qY  •  «  ~2q-\- 1)]^ 

-  3q..v...,  (0)  -q  +  d-  -yi^  -2(7  +  1)10-  -Of 

^n+l  ~  S„Z„+j  ,  S^^j  =  AS„  +X„^.iZ„+j 
~  T 

^  r  1  j4  ~^n+l^n+l  ^ 

K„,,=PMn.,{^*,+Ql,PnQn.X 

v„„  =[zl,L„  y^n-g  +  \)f  ,  =  XL„  +  (n-q  +  \) 

0„.,=0„+k„Ak.,-QLA*,) 

each  of  the  algorithms  with  3000  data  samples.  The 
simulated  system  was 

x{n)  =  v(n)  - 1.25 v(«  - 1)  (nonminimum  system).  (12) 

The  colored  Gaussian  noise  is  generated  after  passing  a 
white,  zero-mean  Gaussian  noise  through  an  MA  filter 
with  coefficients  [1,  -2.33,  0.75,  0.5,  -1.3,  -1.4]. 

Table  IL 

Simulation  Results 


(true  parameter  ^(1)=-1.25, 7V=  3000) 


V  — 

SNR 

OdB 

5dB 

lOdB 

oodB 

proposed 

algorithm 

mean 

std. 

-1.1473 

0.4944 

-1.2711 

0.1049 

-1.2763 

0.0843 

-1.2767 

0.0747 

[2]  (2’nd 
and  3’rd) 

mean 

std. 

-1.3721 

0.4629 

-1.4368 

0.1769 

-1.4442 

0.1432 

-1.3444 

0.1274 

[3]  batch 
algorithm 

mean 

std. 

-1.0753 

0.0451 

-1.1305 

0.0339 

-1.1308 

0.0261 

-1.1568 

0.0286 

batch  alg. 
OfEq.(4) 

mean ' 
std. 

-1.0753 

0.0514 

-1.1257 

0.0315 

-1.1524 

0.0253 

-1.1564 

0.0276 

The  solution  of  (6)  does  directly  not  tell  us  the  sign  of 
b{k).  The  two  choices  are  exist  but  we  will  take  following 
choice  since  the  division  by  the  estimate  of  £*34  as  a 
process  of  obtaining  the  estimates  of  b\k)  may  give  rise 
to  a  very  large  magnitude  of  estimates  by  small  ^3  4 . 


250 


The  obtained  curve  is  depicted  in  Fig.  1,  and  the  statistics 
(mean  ±  variance)  of  estimated  coefficients  are  listed  in 


Table  11. 


Iteration  number 

Fig.  1 .  The  mean  trajectories  of  estimated  parameters, 
in  5  dB  SNR  case. 

Solid  line:  true,  Dotted  line:  proposed. 
Dashed-dotted  line:  alg.  in  [2]  (2nd  and  3rd) 

As  we  see,  the  batch  algorithm  based  on  [3]  and  the 
recursive  algorithm  in  [2]  show  consistent  performance 
for  noiseless  case.  However,  when  the  output  sequence  is 
corrupted  by  additive  Gaussian  noise,  these  estimate  the 
parameter  with  bias  value.  On  the  contrary,  the  proposed 
algorithm  shows  the  consistent  performance  even  if  the 
additive  Gaussian  noise  is  not  unknown  whether  it  is 
white  or  colored.  For  the  comparison  of  batch  algorithm, 
the  estimates  of  recursive  algorithm  are  more  exact  than 
those  of  batch  algorithm.  However,  we  can  see  that  the 
standard  deviations  of  estimates  using  recursive  algorithm 
are  larger  than  those  of  estimates  using  batch  algorithm. 
From  fig.  1,  we  can  see  the  convergence  of  the  proposed 
algorithm  is  obtained  at  1000  points  which  is  relatively 
smaller  than  data  record  length  requirements  for  batch 
algorithm. 

For  the  identification  of  a  time  varying  system,  we 
initially  generated  data  using  x(«)  =  v(«)-0.8v(«-l) ,  and 
switched  after  2500  data  samples  to  another  MA  system 
described  by  the  above  nonminimum  system.  Fig.  2 
shows  the  time  histories  of  ^(1) .  Note  that  the  propose 
algorithm  indeed  tracks  the  variation  of  MA  parameter, 
with  some  delay. 


i-  -2  -j - ^ ^ ^ ^ - 

0  1000  2000  3000  4000  6000 

Iteration  number 

Fig.  2.  The  mean  trajectory  of  estimated  parameter 
in  a  time  varying  case. 

Solid  line:  true,  Dotted  line:  proposed 

IV.  Conclusions 

A  cumulant  based  recursive  parameter  estimation 
algorithm  for  MA  system  identification  was  considered. 
We  have  developed  ORIV  type  recursive  algorithm,  and 
applied  it  to  a  particular  set  of  equations  involving  the  3rd 
and  4th  order  cumulants.  Since  the  proposed  algorithm 
did  not  use  the  correlation  terms,  it  had  a  blindness 
property  against  additive  white/color  Gaussian  noise. 
Hence,  the  proposed  algorithm  was  useful  when  the  SNR 
was  low,  or  the  a  priori  information  of  noise  was  not 
known.  In  addition,  the  proposed  algorithm  showed 
satisfactory  performance  when  we  estimated  the  time 
varying  MA  system. 
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Abstract 

Techniques  based  on  conventional  higher^order  statistics 
fail  when  the  underlying  processes  become  impulsive.  Al¬ 
though  methods  based  on  fractional  lower-order  statistics 
(FLOS)  have  proven  successful  in  dealing  with  heavy-tailed 
processes,  they  fail  in  general  when  the  noise  distribution 
has  very  heavy  algebraic  tails,  Le.,  when  the  algebraic  tail 
constant  is  close  to  zero.  In  this  paper  we  introduce  a  sig¬ 
nal  processing  framework  that  we  call  Zero-Order  Statistics 
(ZOS).  ZOS  are  well  defined  for  any  process  with  algebraic 
or  lighter  tails,  including  the  full  class  of  a-stable  distribu¬ 
tions.  We  introduce  zero-order  scale  and  location  statistics 
and  study  several  of  their  properties.  The  intimate  link  be¬ 
tween  ZOS  and  FLOS  is  presented.  We  also  show  that  ZOS 
are  the  optimal  framework  when  the  underlying  processes 
are  very  impulsive. 

All  figures,  simulations  and  source  code  utilized  in  this 
paper  are  reproducible  and  freely  accessible  in  the  Internet 
at  http://www.ee.udel.edur gonzalez/PUBS/HOS97a 


1.  Introduction 

Second-order  processes  have  been  historically  the  main 
subject  of  study  in  statistical  signal  processing.  Second- 
order-based  estimation  techniques  are  commonly  recog¬ 
nized  as  the  natural  tools  to  be  used  in  the  presence  of 
Gaussian  noise.  Research  efforts  on  higher-order  statistics 
(HOS)  have  led  to  the  development  of  improved  estimation 
algorithms  for  non  Gaussian  environments,  but  this  work 
has  been  based  on  the  assumptions  that  second-order  and 
higher-order  statistics  of  the  processes  exist  and  are  finite 
[7].  Important  non  Gaussian  impulsive  processes,  found  in 

*71118  research  was  funded  in  part  by  the  National  Science  Founda¬ 
tion  under  the  Grant  MIP-9530923,  and  through  collaborative  participation 
in  the  Advanced  Telecommunications/Information  Distribution  Research 
Program  (ATIRP)  Consortium  sponsored  by  the  U.S.  Army  Research  Lab¬ 
oratory  under  Cooperative  Agreement  No.  DAALOl -96-2-0002. 


radar  and  mobile  communications  for  example,  can  be  effi¬ 
ciently  modeled  by  infinite  variance  processes  for  which  the 
theory  of  HOS  is  not  useful  [3,  4, 5,  6,  8]. 

It  has  been  shown  repeatedly  in  the  literature  that  heavy¬ 
tailed  processes  that  appear  in  practice  are  well  modeled 
by  probability  distributions  with  algebraic  tails,  i.e.,  ran¬ 
dom  variables  for  which  Pr(|Xl  >  x)  =  for  some 

fixed  a  >  0.  Examples  of  such  noise  processes  include 
those  modeled  by  a-stable  distributions  [8],  Hall’s  general¬ 
ized  t-model  [3],  the  generalized  Cauchy  model  [6],  and  the 
Pareto  distribution.  The  tail-heaviness  of  these  distributions 
is  mainly  determined  by  the  tail  constant  a,  with  increased 
impulsiveness  corresponding  to  smaller  values  of  a. 

Algebraic-tailed  random  variables  exhibit  finite  absolute 
moments  for  orders  less  than  a;  i.e.,  E\X\^  <  oo  if  p  <  a. 
Conversely,  if  p  >  a,  the  absolute  moments  become  infi¬ 
nite,  making  them  unsuitable  for  statistical  analysis.  When 
a  <  2,  the  processes  present  infinite  variance,  and  the  stan¬ 
dard  second  or  higher-order  statistics  cannot  be  successfully 
applied.  Attempts  to  characterize  the  behavior  of  impulsive 
signals  in  this  scenario  have  relied  on  fractional  lower-order 
statistics  (FLOS)  in  the  context  of  non  Gaussian  a-stable 
distributions  (a  <  2).  Here,  given  a  fixed  a,  appropriate 
choices  of  p  in  the  interval  (0;  a)  can  lead  to  useful  charac¬ 
terizations  of  the  process  structure  [8]. 

While  FLOS  have  been  proven  useful  when  dealing  with 
impulsive  signals,  they  present  several  shortcomings.  First, 
FLOS  do  not  provide  a  universal  framework  for  dealing 
with  algebraic-tailed  processes.  Since  p  is  usually  restricted 
to  the  interval  (0;a),  constructing  a  valid  FLOS  requires 
the  previous  knowledge  (or  estimation)  of  a  in  order  to 
pick  an  appropriate  value  of  p.  On  the  other  hand,  for  any 
given  p  >  0,  there  will  always  be  a  “remaining”  class  of 
very  impulsive  processes  (those  with  a  <  p)  for  which  the 
associated  FLOS  are  not  appropriate. 

In  this  paper  we  introduce  the  preliminaries  of  a  new 
theory  of  statistics  which  is  well  defined  overall  distributions 
with  algebraic  or  less-heavy  tails.  Unlike  lower  or  higher- 
order  statistics,  these  zero-order  statistics  or  ZOS,  as  we 
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will  call  them,  provide  a  common  ground  for  the  analysis 
of  basically  any  known  distribution  of  practical  use.  In  the 
same  way  as  p*-order  moments  constitute  the  basis  of  FLOS 
and  HOS  techniques,  the  theory  of  zero-order  statistics  is 
based  on  logarithmic  “moments’*  of  the  form  E  log  \X\,  We 
study  the  fundamental  properties  of  ZOS  and  prove  their 
optimality  when  the  tail  constant  a  tends  to  zero.  This 
important  property  insinuates  the  strong  potential  of  ZOS  in 
the  robust  estimation  of  very  impulsive  processes. 

2.  Logarithmic-Order  Processes 

The  following  theorem  provides  a  natural  motivation  for 
characterizing  the  class  of  processes  of  interest  in  this  paper: 

Theorem  1  Let  X  be  a  random  variable  with  algebraic  or 
less  heavier  tails.  Then,  E\og\X\  <  oo. 

Proof  See  [1]. 

Since  our  main  goal  is  to  develop  a  signal  processing 
framework  under  which  all  random  processes  with  algebraic 
tails  can  be  characterized,  Theorem  1  allows  us  to  restrict 
our  attention  to  processes  with  finite  logarithmic  moments. 
We  will  refer  to  such  processes  as  being  of  “logarithmic 
order”,  in  analogy  with  the  term  “second  order”,  used  to 
denote  processes  with  finite  variance. 

Of  particular  interest  in  this  paper  is  the  class  of 
logarithmic-order  processes  with  infinite  variance,  which 
includes  all  the  algebraic-tailed  processes  with  tail  constant 
a  <  2.  Classical  statistical  methods,  usually  derived  from 
the  theory  of  second-order  processes,  present  serious  limita¬ 
tions  when  dealing  with  this  kind  of  processes.  For  example, 
design  techniques  based  on  the  minimization  of  the  mean 
square  error  (MSB)  become  practically  useless  since  the 
MSB  turns  out  to  be  infinite  all  the  time.  Least  squares  re¬ 
gression  (LS)  and  general  linear  techniques  are  also  widely 
acknowledged  as  inappropriate  in  infinite  variance  environ¬ 
ments  [8].  Spectral  characterizations,  of  paramount  impor¬ 
tance  in  engineering  applications,  are  no  longer  meaningful 
since  the  (second-order)  power  of  the  processes  become  in¬ 
finite.  Bven  the  mean  parameter  /z  =  EX,  basic  ingredient 
of  classical  probability  theory,  does  not  exist  for  algebraic¬ 
tailed  distributions  with  a  <  1.  Also,  fundamental  estima¬ 
tors  of  location  and  scale  such  as  the  sample  average  and 
the  sample  standard  deviation  present  serious  shortcomings. 
The  sample  average  rapidly  loses  efficiency  when  the  tail 
constant  departs  from  a  =  2  to  smaller  (more  impulsive) 
values.  For  tail  constants  a  <  1  the  sample  average  reach¬ 
es  inconsistency,  becoming  more  harmful  than  useful  when 
a  <  1.  The  sample  standard  deviation,  on  the  other  hand, 
is  not  consistent  for  any  a  <2. 

^Due  to  lack  of  space,  we  have  omitted  the  proofs  in  this  paper,  limiting 
the  discussion  to  the  most  significant  insights.  The  reader  is  referred  to  the 
paper  website  [1]  for  detailed  proofs. 


The  evident  limitations  of  classical  methods  in  the  con¬ 
text  of  logarithmic-order  processes  make  it  necessary  to 
develop  new  basic  statistics  under  which  an  efficient  theory 
of  estimation  can  be  built. 

3.  The  Geometric  Power 

In  this  section  we  introduce  a  new  scale  indicator,  name¬ 
ly  the  geometric  power^,  which  overcomes  many  of  the 
limitations  of  second-order  theory  in  the  framework  of 
logarithmic-order  processes.  Next,  in  Section  4,  a  loca¬ 
tion  indicator  intimately  linked  with  the  geometric  power 
is  developed.  These  two  parameters  constitute  the  under¬ 
pinnings  of  the  theory  of  ZOS.  Much  as  the  (second-order) 
power  and  the  expected  value  play  a  central  role  in  the  theo¬ 
ry  of  second-order  processes,  we  believe  that  the  geometric 
power  and  its  related  location  parameter  are  of  fundamental 
importance  in  the  development  of  a  theory  for  logarithmic- 
order  processes. 

Definition  1  Let  X  be  a  logarithmic-order  random  vari¬ 
able.  The  geometric  dispersion  of  X  is  defined  as 

5o  =  5o(X)  =  e'®'°8l^l  (1) 

3.1.  Properties 

It  can  be  easily  shown  that  the  geometric  power  is  a 
scale  parameter,  and  as  such,  it  can  be  effectively  used  as  an 
indicator  of  process  strength  or  “power”  in  situations  where 
second-order  methods  are  inadequate.  In  the  following,  we 
enumerate  some  of  the  most  important  properties  of  this  new 
parameter.  The  reader  is  referred  to  the  paper  website  [1] 
for  the  proofs. 

1.  5o  is  a  scale  parameter.  For  any  given  constant  c, 
5o(cX)  =  \c\So{X). 

2.  So  is  an  indicator  of  *fiower*'  or  process  strength. 
For  all  X,  So{X)  >  0.  For  any  given  constant  c, 
So(c)  =  |c|.  In  addition,  So{X)  =  0  if  and  only  if 
Pr(X  =  0)  >  0,  which  implies  that  zero  power  is 
only  attained  when  there  is  a  discrete  probability  mass 
located  in  zero. 

3.  Triangular  Inequality.  For  any  pair  of  random  variables 
X  and  Y,  So{X  -\-Y)<  So{X)  4-  So{Y) 

4.  Chebyshev  Inequality.  Let  X  be  a  symmetric  random 
variable  with  symmetry  center  p,  such  that  So{X  — 
/i)  >  1.  Then, 

Pr(|X-/i|>/<=>5o(X-iu))<-.  (2) 

c 

^We  coined  the  name  geometric  because  of  the  intimate  link  of  this 
parameter  with  the  geometric  mean.  Unfortunately  there  is  not  enough 
space  in  this  paper  to  illustrate  this  link. 
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5.  Multiplicativity.  For  any  pair  of  random  variables  X 
and  r,  So{XY)  =  So{X)Sq{Y) 


3.2.  Geometric  power  of  a-stable  processes 

Alpha-stable  processes  constitute  one  of  the  most  impor¬ 
tant  infinite- variance  families  in  the  logarithmic-order  class. 
Since  they  are  the  only  processes  that  satisfy  a  form  of  Gen¬ 
eralized  Central  Limit  Theorem,  they  can  appear  in  practice 
as  a  result  of  natural  stochastic  phenomena.  For  an  introduc¬ 
tion  to  a-stable  distributions  in  a  signal  processing  context 
see  [8]. 

The  geometric  power  of  a  zero-centered  symmetric  a- 
stable  {SaS)  distribution  can  be  calculated  as 


5o  = 


C 


(3) 


where  C  =  exp(Ce)  «  1.78,  is  the  exponential  of  the  Euler 
constant,  7  is  the  dispersion  parameter,  and  a  is  the  charac¬ 
teristic  exponent  of  the  distribution.  A  simple  derivation  of 
this  expression  is  available  on  the  website  of  this  paper  [1]. 

Figure  1  illustrates  the  soundness  of  the  geometric  power 
as  an  indicator  of  signal  strength.  The  scatter  on  the  left  side 
was  generated  from  a  stable  distribution  with  a  =  1 .99  and 
5o  =  1.  On  the  right-hand  side,  the  scatter  comes  from  a 
Gaussian  distribution  (a  =  2)  also  with  unitary  geometric 
power.  After  an  intuitive  inspection  of  Fig.  1,  it  is  reasonable 
to  conclude  that  both  of  the  generating  processes  possess 
the  same  strength,  in  accordance  with  the  values  of  the 
geometric  power.  Contrarily,  the  values  of  the  second-order 
power  lead  to  the  misleading  conclusion  that  the  process  on 
the  left  is  much  stronger  than  the  one  on  the  right. 

It  is  easy  to  find  examples  like  the  above  that  also  disqual¬ 
ify  FLOS-based  indicators  of  strength  in  the  logarithmic- 
order  framework.  In  fact,  fractional  moments  of  order  p 
present  the  same  type  of  discontinuities  like  the  one  illus¬ 
trated  in  Fig.  1  for  processes  with  tail  constants  close  to 
a  =  p.  The  geometric  power,  on  the  other  side,  is  consis¬ 
tently  continuous  along  all  the  range  of  values  of  a,  giving  a 
more  intuitive  appealing  in  the  context  of  logarithmic-order 
processes.  This  “universality”  of  the  geometric  power  pro¬ 
vides  a  general  framework  for  comparing  the  strengths  of 
any  pair  of  logarithmic-order  signals,  in  the  same  way  as  the 
(second-order)  power  is  used  in  the  classical  framework. 


3.3.  Relation  with  FLOS 

The  geometric  power  is  intimately  linked  to  FLOS  pa¬ 
rameters  as  indicated  in  the  following: 

Theorem  2  Let  Sp  =  {E\X\^y/^  denote  the  scale  param¬ 
eter  derived  from  the  p^ -order  moment  of  X,  If  Sp  exists 
for  sufficiently  small  values  ofp,  then 

Sq  =  lim  Sp.  (4) 


Figure  1 .  Comparison  of  second-order  power 
Vs.  geometric  power  for  i.i.d.  a-stabie  pro¬ 
cesses.  Left:  a  =  1.99.  Right:  a  =  2. 


Furthermore,  So  <  Sp,  for  any  p>0. 

Proof:  See  [1]. 

The  above  result  indicates  that  techniques  derived  from 
the  geometric  power  are  the  limiting  zero-order  relatives 
of  FLOS.  We  coined  the  name  “Zero-Order  Statistics”  as  a 
consequence  of  this  property. 

4.  Zero-Order  Location  Parameter 

In  the  framework  of  second-order  processes  the  mean 
of  a  random  variable  can  be  conveniently  described  as  the 
parameter  p2  that  minimizes  the  power  of  the  shifted  variable 
X  “  /i.  This  is, 

P2  =  argmjn  52 (X  -  p)  (5) 

In  the  same  way,  the  geometric  power  motivates  the  defi¬ 
nition  of  a  location  parameter  of  fundamental  importance  in 
the  theory  of  logarithmic-order  processes: 

Definition  2  Let  X  be  a  logarithmic-order  variable.  We 
define  the  zero-order  indicator  of  location  as  the  parameter 
po  that  minimizes  the  geometric  power  of  the  shifted  variable 
X  —  p.  This  is. 


po  =  arg  min  5o(X  -  p)  (6) 

4.1.  Zero-Order  Estimation  of  Location 

The  definition  of  the  ZOS  location  parameter  in  (6)  mo¬ 
tivates  the  development  of  a  new  estimator  with  strong  po¬ 
tential  for  locating  very  impulsive  processes.  Substituting 
(1)  in  (6),  we  can  reformulate  the  definition  of  po  as 

Po  =  argminf?log|X  -  (7) 
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In  order  to  propose  an  estimator  of  //q,  an  intuitive  con¬ 
sideration  of  (7)  would  advise  us  to  replace  the  expected 
value  operator  by  the  sample  average  so  that  we  obtain 

N 

Ao  =  argmin  ^  log  \xi  -  fi\.  (8) 

^  i=l 

This  expression  has  the  problem  that  its  argument  equals 
-00  at  every  sample  value  (i.e.  whenever  fi  =  a),  which 
results  in  the  minimization  giving  a  multiple  tie  among  all 
the  samples.  Although  this  tie  is  a  serious  problem  that  has 
to  be  resolved,  it  gives  us  very  important  information:  fio 
is  a  “selection-type”  estimator,  in  the  sense  that  it  is  always 
equal  to  one  of  the  sample  values^.  In  order  to  resolve  the 
tie,  we  can  constrain  the  minimization  problem  in  (8)  to  a 
compact  set  that  does  not  include  indeterminacies  of  the  cost 
function.  This  can  be  done  by  defining 

N 

fis  =  argmm  log  \xi  -  /?|,  (9) 

i=l 

where 

N 

A  =  SR-U(a:f-<5;a:i-|-<l),  (10) 

t=i 

OJ  is  the  real  numbers  set  and  ^  is  a  small  positive  constant. 
The  tie  is  easily  solved  by  letting 

Ao=liniA<5-  (11) 

Mathematical  manipulation  of  (11)  leads  to  a  surprisingly 
simple  expression  for  fto: 

N 

Ao  =  arg^min  JJ  \xi-Xj\,  (12) 

where  M  is  the  set  of  most  repeated  values  in  the  sample"^. 
According  to  (12),  /io  will  always  be  one  of  the  most  repeat¬ 
ed  values  in  the  sample,  resembling  the  behavior  of  a  sample 
mode.  This  mode  property,  as  we  call  it,  intuitively  insin¬ 
uates  a  high  effectivity  of  the  estimator  in  the  presence  of 
heavy  impulsive  noise.  In  the  following  sections  we  report 
both  theoretical  and  experimental  evidence  of  this  intuition. 

4.2.  Properties  of  the  ZOS  location  estimator 

In  addition  to  the  “selection”  and  mode  properties  re¬ 
ported  above,  (xq  presents  a  rich  set  of  interesting  properties. 
In  the  following  we  enumerate  some  of  the  most  important 
ones.  The  interested  reader  is  referred  to  the  paper  website 
[1]  for  discussion  on  the  proofs. 

^This  “selection”  property,  shared  also  by  the  median,  would  make  [lq 
very  appropriate  for  image  processing  applications  [9, 2]. 

"^See  the  paper  website  [1]  for  a  rigurous  derivation  of  this  formula. 


1.  Shift  and  scale  invariance.  Let  zi  =  axi  -f 
6,  for  i  =  l,--*,Ar.  Then,  Ao(^i,  *  *  *  ,^7v)  = 

,xn) 

2.  No  overshoot/undershoot.  The  following  bounds  al¬ 
ways  hold  for  pq: 

X{2)  <  Ao  <  (13) 

where  denotes  the  order  statistic  of  the  sample. 
A  direct  consequence  of  this  property  is  stated  next. 

3.  IfN  =  3,fio  ts  equivalent  to  the  sample  median, 

4.  Consistency.  Under  very  general  regularity  conditions 
on  the  distribution  of  the  underlying  process,  po  is 
a  consistent  estimator  of  po.  When  the  underlying 
distribution  is  symmetric,  po  converges  asymptotically 
to  the  center  of  symmetry. 

5.  Optimality  in  very  impulsive  environments.  Let 
{/a}a>o  denote  a  family  of  symmetric  and  algebraic¬ 
tailed  density  functions  parameterized  by  the  tail  con¬ 
stant  a.  Let  Ta(xi, •  •  •  ,xjv)  denote  the  maximum 
likelihood  estimator  associated  with  /«.  Then, 

5  ‘  ’  j  ^iv)  =  Pq{p^\  >  *  *  *  >  (^4) 

independently  of  the  density  functions  /«. 

The  last  two  properties  constitute  a  solid  theoretical  ar¬ 
gument  compelling  the  use  of  ZOS  in  estimation  problems 
that  involve  very  heavy  tails.  As  it  should  be  expected,  the 
proofs  are  very  elaborated  and  require  advanced  mathemati¬ 
cal  tools.  The  interested  reader  is  referred  again  to  the  paper 
website  [1]  for  details. 

A  word  of  caution  shall  be  stated  here.  Being  mode- type, 
Po  presents  a  large  breakdown  point  in  the  sense  that  two 
outliers  with  exactly  the  same  value  may  have  catastrophic 
effects  in  the  performance  of  the  estimator.  However,  for 
random  variables  with  continuous  distributions  this  does  not 
represent  a  serious  danger.  As  it  is  stated  by  Properties  4 
and  5,  the  possibility  of  such  an  event  does  not  preclude  the 
estimator  from  being  consistent  and  efficient  over  a  wide 
spectrum  of  logarithmic-order  distributions. 

5.  Performance  of  ZOS  in  a-stable  noise 

The  performance  of  the  ZOS  location  estimator  was  com¬ 
paratively  evaluated  using  Montecarlo  simulation.  Figure  2 
shows  the  estimated  mean  absolute  errors  (MAE)  of  the 
sample  mean,  the  sample  median  and  the  ZOS  estimator 
when  used  to  locate  a  size  5  i.i.d.  symmetric  a-stable  sam¬ 
ple.  The  values  of  a  ranged  from  2  (Gaussian  case)  down  to 
0.3  (very  impulsive).  Values  close  to  2  indicate  a  distribu¬ 
tion  close  to  the  Gaussian,  in  which  case  the  sample  mean 
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outperforms  both  the  median  and  the  ZOS  estimator.  As 
a  is  decreased,  the  noise  becomes  more  impulsive  and  the 
sample  mean  rapidly  loses  efficiency,  being  outperformed 
by  the  sample  median  for  values  of  a  less  than  1.7.  More 
interestingly,  as  ot  approaches  1 ,  the  estimated  MAE  of  the 
sample  mean  explodes.  In  fact,  it  can  be  shown  that,  for 
a  <  1  it  is  more  efficient  to  use  any  of  the  sample  values 
than  the  sample  mean  itself,  making  the  estimator  totally 
useless.  As  a  continues  to  decrease,  the  sample  median 
loses  progressively  more  efficiency  with  respect  to  the  ZOS 
estimator,  and  at  a  «  0.87,  the  ZOS  begins  to  outperform 
the  sample  median.  This  is  an  expected  result  given  the 
optimality  of  the  ZOS  estimator  for  small  values  of  a.  For 
the  last  value  in  the  plot,  a  ^  0.3,  the  ZOS  estimator  has  an 
estimated  efficiency  ten  times  larger  than  the  median.  This 
increase  in  relative  efficiency  is  expected  to  grow  without 
bounds  as  ol  approaches  0  (the  optimality  point  of  the  ZOS 
estimator).  Unfortunately,  the  variance  of  the  MAE  also 
increases  for  decreasing  values  of  a,  making  the  evaluation- 
s  via  simulation  difficult.  More  theory  must  be  developed 
that  allows  the  evaluation  and  comparison  of  estimators  for 
small  values  of  a. 

6.  Conclusions 

We  have  introduced  the  preliminaries  of  a  theory  of 
zero-order  statistics  (ZOS)  which  is  sound  and  consisten- 
t  for  all  processes  with  finite  logarithmic  moments.  This 
“logarithmic-order”  class,  includes  impulsive-type  process¬ 
es  modeled  by  algebraic  tails,  and  basically  embraces  all  the 
probability  models  with  known  practical  use.  Two  new  pa¬ 
rameters,  the  geometric  power  and  the  zero-order  location, 
constitute  the  underpinnings  of  ZOS  theory,  playing  a  role 
similar  to  the  power  and  the  expected  value  in  the  context  of 
second-order  processes.  Properties  of  the  new  parameters 
were  reported,  and  a  novel  mode-type  estimator  with  opti¬ 
mality  properties  in  very  impulsive  processes  was  directly 
derived  from  the  zero-order  location.  Given  the  limitation- 
s  of  second-order  and  fractional-order  theory,  ZOS  are  an 
attractive  alternative  in  signal  processing  applications  with 
infinite  variance  processes. 

Further  work  is  to  be  done  in  developing  a  theory  of 
zero-order  correlation  that  successfully  attacks  estimation 
problems  in  very  impulsive  non  i.i.d.  (bursty)  environments. 
More  work  is  also  necessary  in  developing  estimators  of 
ZOS  parameters,  establishing  a  spectral  theory,  and  defining 
more  convenient  evaluators  of  estimator  performance  in  very 
impulsive  noise. 
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Abstract 

The  paper  is  focused  on  the  problem  of  multilevel 
digital  signal  estimation  in  the  presence  of  generic  noise 
in  a  communication  system.  Noise  is  assumed  unimodal, 
independent  identically  distributed,  generically  non 
Gaussian,  that  is  eventually  asymmetric,  impulsive  or 
not.  The  proposed  solution  is  based  on  a  previously 
developed  estimator  which  requires  the  analytical 
probability  density  function  model  of  the  noise. 

The  selected  estimator  was  originally  applied  under 
the  assumption  of  SaS  noise  distribution.  In  this  paper, 
the  asymmetric  generalized  Gaussian  (agg)  model  is 
selected  as  a  suitable  model  to  describe  the  noise 
processes:  hence,  it  is  discussed  and  compared  with  the 
SaS  distributions  in  terms  of  decoding  performances. 

Tests  were  performed  on  simulated  binary  sequences 
corrupted  by  interference  generated  as  SaS  processes. 
Test  results  outlines  comparable  performances  of  the  two 
families  of  parametric  noise  models. 

1.  Introduction 

The  present  work  is  addressed  to  multilevel  digital  signal 
estimation  in  the  presence  oi generic  noise. 

This  problem  is  relevant  in  a  variety  of  fields,  in 
particular  those  involving  telecommunication  problems 
(e.g.,  radar,  sonar,  etc.),  in  which  many  processes  cannot 
be  realistically  described  as  Gaussian.  “ 

For  example,  if  a  communication  receiver  was  built 
for  working  under  conditions  of  Gaussian  noise  and  is 
subjected  to  non  Gaussian  (impulsive  or  not) 
interference,  its  performances  strongly  degrade  and  may 
get  imacceptable. 

Most  of  previous  works  proposed  solutions  under  the 
assumption  of  Gaussian  or  Gaussian-mixture  noise. 
Other  recent  works  ([1]  and  related  papers)  were  focused 
on  the  estimation  of  signals  corrupted  by  impulsive 
interference,  by  first  computing  the  a-posteriori  density, 
and  then  using  Symmetric-a-Stable  (SaS)  distributions 
in  the  estimator. 


Actually,  the  stable  model  was  validated  with  a  variety 
of  real  impulsive  data,  which  are  very  common  as 
generated  by  natural  and  man-made  sources,  and  were 
shown  to  fit  well  all  of  them  [2].  Many  works 
investigated  on  the  analytical  properties,  and  good  data- 
fitting  capabilities  of  SaS  distributions  mainly  in 
detection  and  estimation  problems  [2][3].  Other  papers 
were  published  about  the  generation  of  processes  with 
SaS  distribution  [2]  [3]  and  the  estimation  of  parameters 
of  alpha-stable  interference  [4]. 

Although  a  generic  eventually  asymmetric  version  of 
alpha-stable  model  was  formulated  [2],  signal  processing 
applications  were  commonly  based  on  the  particular  case 
of  the  symmetric  SaS  probability  density  function  (pdf). 

For  taking  into  account  generalized  cases,  the  present 
approach,  starting  from  the  estimator  proposed  in  [1], 
extends  its  use  to  arbitrary,  eventually  non-impulsive 
and/or  asymmetric  noise,  modeled  by  a  parametric  pdf 
having  a  very  simple  analytical  closed  form. 

This  model  depends  on  few  parameters  of  the  Second 
and  Higher  Order  Statistics  (SOS  and  HOS)  [5],  to 
estimate  from  data.  It  is  the  so-called  asymmetric 
generalized  Gaussian  {agg)  pdf  [6],  and  derives  from  the 
asymmetric  Gaussian  and  the  generalized  Gaussian  [7] 
pdfs. 

Its  main  characteristics  consist  in:  dependence  on 
statistical  parameters  up  to  the  fourth  orders,  being 
simple  to  estimate  and  allowing  the  model  to  describe 
densities  with  variable  skewness  and  shape  (in  terms  of 
sharpness  and  tail  heaviness);  mathematical  similarities 
with  the  Gaussian  function,  hence  inheriting  many 
properties  of  the  Gaussian  pdf;  robustness  to  accurately 
fit  many  real  distributions. 

In  this  paper,  the  selected  model  is  briefly  outlined;  in 
particular  its  analytical  properties  are  discussed  and 
compared  with  the  SaS  model  capabilities  from  a 
theoretical  point  of  view. 

Results  obtained  from  numerical  simulations  by 
applying  the  selected  estimator  under  the  two  noise 
model  assumptions  are  presented  and  evaluated  in  terms 
of  accuracy. 
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i  =1, ....  L  k=l,2......N 


(4) 


2.  The  problem  of  signal  estimation 

The  problem  of  multilevel  digital  signal  estimation  is 
mathematically  formalized  as  follows. 

The  scalar  discrete  state-space  system  [5]: 

~ -^k^k  k  =  l^,...,N 

[Zk=HkXk+Vk  Ak,Hk€dl 

is  considered,  where  is  the  independent 

identically  distributed  (iid)  noise  having  pdf  p^.)  and 
added  to  the  model  of  interference  (XJ,  and  (VJ  is  the 

I-level  signal  i=l, 2,.. .L, 

The  sequences  {WJ  and  (VJ  are  assumed  to  be  mutually 
independent.  Let  Z2,...,ZJ  be  the  set  of  possible 

observations  up  to  time  t,  for  any  fixed  k,  the  past 
measurement  z‘={zi,:  Z,=z,,  1-1,2, ...k}  is  given.  Xf^  is 
the  filtered  estimator  of  the  true  ^4+,  its  prediction. 
The  objective  of  the  algorithm  is  finding  Xj^  so  that  the 
error  \X^-  X,^  \  is  minimum,  thus  removing  interference 
from  observations  and  recovering  the  signal. 


2.1.  The  recursive  estimator 


Different  types  of  estimators  can  be  designed,  once 
computed  the  a-posteriori  pdf  /  z* ) .  The 

method  proposed  by  Shen  and  Nikias  in  [1]  is  selected 
here:  it  depends  explicitly  on  the  analytical  model  of  the 
noise  pdf,  hence  taking  into  account  eventual  non 
Gaussianity  and  allowing  one  to  select  the  best  model 
that  fits  data. 

In  this  section  a  brief  description  of  the  method  is 
included  in  order  to  clarify  the  employment  of  the 
selected  parametric  models  in  expressing  analytically  the 
pdf  pw(-)  of  the  generic  noise  {WjJ. 

By  applying  the  Bayesian  law,  a  useful  theorem  was 
proved  from  (1):  given  the  initial  a-priori  pdf  Px^i.^\l> 
the  a-posteriori  pdfs  are  uniquely  determined  by: 

k=3,4,...(2) 


k=2,3,...(?) 


given 


and  the  initial  densities: 

,  ,  Pxi  (^I )Pz\IX\  (^1  ^  -^1 ) 


(5) 


^i^xQiPxi  ihj)Pwi^2  ... 

St, 

An  optimal  criterion  based  on  the  zf -distance  was 
selected,  for  which  XJZ'‘)  must  be  chosen  so  that 


||Xi(Z*) - =  min,  p  6 (0,+oo) . 

In  the  present  work  attention  is  focused  on  the  only 
p=l  case  (i.e.,  the  so-called  AVC  method),  as  considered 
less  restrictive  about  error  distribution  and  practically 
equivalent  about  performances  with  respect  to  the  p=2 
case  [1].  The  AVC  procedure  follows. 

Ato  densities  initializations,  for  k=},2...,Nr&peat: 

1.  From  the  data  set  Z*  and  the  signal  levels  {dj  the 

measurement  set  i=l,2,...,L}  is  computed. 

2.  The  numerical  weights  P „ 

AkfZ 

i=l,2,...,L  of  the  L  -distance  can  be  evaluated  from 
the  mentioned  theorem. 

3.  The  objective  function 

d (^)  =  S  ^iPxkiz'‘~^  ^  ^  ~ 

can  be  computed  for  any  data  point  in  where  x 
is  the  current  interference  value  to  estimate.  Its 
optimum  value,  xf'(z*),  minimizing  J(x),  allows 
one  to  recover  the  signal  value  . 

4.  Finally,  p  can  be  predicted  from  (2). 

Xk+\fZ 

It  is  clear  that  the  algorithm  needs  a  realistic 
analytical  model  of  the  {W^^}  distribution. 


3.  Noise  modeling  for  estimation  optimization 

A  parametric  pdf  is  proposed  for  modeling  the  fWJ 
distribution.  It  is  compared  in  terms  of  statistical  and 
mathematical  properties,  and  estimation  performances 
with  the  SaS  distribution  [2],  whose  application  in  the 
selected  estimator  was  presented  and  discussed  in  [1]. 


3.1.  The  asymmetric  generalized  Gaussian  (agg) 
pdf 

A  parametric  pdf  is  selected,  able  to  fit  in  analytically 
simple  and  realistic  way  iid  unimodal  generically  non 
Gaussian  noise,  being  impulsive  or  not,  symmetric  or 
not. 
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For  describing  deviation  from  symmetry,  it  was  found 
[7]  that  the  information  provided  by  the  3rd-order 


parameter  skewness  is  equivalent  to  that  yielded  by  the 
combination  of  the  empirical  left  and  right  variances: 

(8) 

(9) 

where  is  the  pdf  mode,  and  Ni  {N^)  is  the  number  of 
Xk  samples  <mx  (>m^.  If  they  are  inserted  in  the 
generalized  Gaussian  model,  having  variable  shape  in 
terms  of  sharpness,  the  asymmetric  generalized  Gaussian 
is  derived: 


f  CYa 


Pagg(.x)  =  \ 


r(i/c) 


[r(i/c) 


Miiere: 


XKIflx 

x>mx 


(10) 


n 


1  rr(3/c)y^"  1  rr(3/c)y^^ 

c7,lr(i/c)j  (rAr(i/c)j  ’ 


(11) 


1  rr(3/c)^ 

CT/  +CT^  lr(l/c)j 


(12) 


An  approximated  analytical  model  matching  the 
theoretical  parameter  c  with  the  kurtosis  was  achieved 

[7] ;  it  was  developed  in  the  limit  case  of  af  =  crl ,  but  it 
is  proved  to  be  suitable  also  for  any  other  variance 
combinations  since  the  MSB  value  between  the  c-based 
and  the  Pr  based  pdfs  is  always  of  the  order  of  10'^. 

The  kurtosis-based  agg  pdf  is  defined  for  It 

is  continuous,  and  derivable  everywhere.  It  was  proved 

[8]  to  be  approximately  stable  in  the  case  of  >05^3,  i.e. 
when  it  is  heavy-tailed  and  refers  to  impulsive  variables. 


The  parameter  a  is  responsible  of  the  pdf  capability  of 
varying  the  heaviness  of  the  pdf  tails;  the  smaller  its 
value  the  heavier  the  tails.  In  this  sense  it  is,  at  least 
intuitively,  related  to  kurtosis  (and  to  c  parameter  of  the 
generalized  Gaussian  pdQ.  For  particular  values  of  a  the 
family  reduces  to  some  limit  cases:  for  a=2  a  Gaussian 
function,  for  cif=l  a  Cauchy  model  are  obtained.  For  other 
a  values  the  model  does  not  present  closed-form  pdf, 
which  can  be  a  significant  limitation  in  its  analytical 
employment,  although  it  can  be  numerically  computed  as 
suggested  in  [4]. 

The  dispersion  y  is  related  to  the  spread  of  the  SaS 
model  which  is  the  same  information  described  by 
variance  in  the  Gaussian  and  generalized  Gaussian  pdfs. 

The  location  parameter  S  is  the  pdf  point  of  symmetry, 
hence  is  analogous  to  the  mean  value. 

Also  the  SaS  pdfs  present  similarities  with  the 
Gaussian  model,  first  of  all,  the  stability  property,  which 
is  maintained  for  any  non  Gaussian  SaS  pdf  [4]. 

However,  only  the  moment  of  the  related  variable 
having  order  p<a  are  finite  (conversely,  all  the  Gaussian 
moments  are  finite).  This  property  is  rigorously  true  only 
in  theory:  in  most  of  practical  applications,  in  which  a 
finite,  whatever  high,  number  of  data  samples  with  finite, 
whatever  big,  values,  is  available,  moments  can  be 
estimated  also  in  the  case  of  impulsive  stochastic 
variables. 

The  agg  pdf  class  extends  SaS  capabilities  by  taking 
into  account  also  non  impulsive  (sub-Gaussian)  pdfs, 
although  a  lower  kurtosis  bound  is  imposed. 

Both  the  SaS  distribution  and  the  agg  pdf  can  be 
easily  extended  to  multivariate  cases.  Their  common 
main  limitations  consist  in  unimodality  and  needing  of 
estimating  their  parameters  on  the  basis  of  long  series  of 
data. 


3.2.  Comparison  with  the  SaS  pdf  model 

A  comparison  with  the  a-stable  distributions  [4]  can  be 
useful,  as  they  were  selected  in  [1]  for  validating  the 
estimator  employed  here  undet  impulsive-noise 
conditions. 

The  SaS  pdf  PcHyA^)  of  the  stochastic  variable  x  can 
be  defined  by  the  inverse  Fourier  transform  integral: 

PairM  =  ^  (13) 

and  is  completely  determined  by  fixing  three 
parameters,  the  characteristic  exponent  a  (0<ct<2),  the 
dispersion  y(j>0),  and  the  location  parameter  S  ({?e9I). 
The  corresponding  characteristic  function  is  given  by  the 
exponential: 

=  (14) 


4.  Simulation  results  and  discussion 

In  the  simulation  tests,  let's  assume  the  noise  density  as 
agg  and  compare  performances  with  the  SaS  assumption 
proposed  in  [1].  Comparison  is  carried  out  on  the  data 
corrupted  by  simulated  impulsive  noise. 

A  single  random  double-level  signal  (VJ  (P. 
j^P+i=l/2)  of  length  N  (JNr=1000)  was  generated  by  using 
standard  uniform  distribution  routines.  The  experimental 
results  were  obtained  by  taking  this  sequence  as  a 
common  reference  for  all  the  runs. 

Several  noise  sequences  {W/J  of  the  same  length  N 
were  generated  as  SaS  processes  [3]:  the  characteristic 
exponent  a  varies  within  the  range  [0.4-2.0],  the 
dispersion  /  goes  from  0.2  up  to  5.0,  while  the  location 
parameter  is  fixed  to  0. 

Figure  1  plots  one  of  the  generated  SaS  noise 
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sequences  given  cc^X.S,  5=0.  The  estimated  agg 

pdf  (jU,=-0.02,  m=-0A3,  erf  =25.2,  (T,"=37.4,  >02^68.1), 
superimposed  on  the  related  histogram,  shows  a  good 
curve  fitting  (see  Fig.  2). 


200  400  600  800  1 000 
Time 


Figure  1 .  One  realization  of  SaS  noise 
(a=1.5,;=2.0,5=0). 


Figure  2.  Fitting  of  the  agg  pdf  with  the  data  histogram. 

In  order  to  apply  the  agg  pdf  model,  it  is  necessary  to 
estimated  its  characteristic  parameters  from  each  noise 
sequence.  This  operation  is  performed  by  simply  using 
the  parameter  operative  definitions. 


Figure  3.  ^  estimated  in  terms  of  the  a  parameter  from 
SaS  sequences  for  some  y  fixed  values. 


Figure  3  presents  the  statistical  P2  trends  as  a  varies 
for  some  values,  obtained  by  averaging  the  estimates  on 
50  sequences  each  composed  by  10000  samples. 

The  system  (1)  was  applied  for  obtaining  the 
interference  and  the  measure  data,  with  Ak=0.1  and  /ft=l 
fixed  for  each  nm  and  for  each  discrete  time  instant  k. 
The  same  number  {M=10)  of  independent  runs  for  each 
combination  of  the  parameters  a  and  y  were  performed 
and  the  obtained  results  averaged. 


Figure  4.  (a)  SaS  and  (b)  agg  decoding  performances  for 
combination  of  a  and  y. 


Figure  4  shows  the  correct-decoding  probability  [0-1] 
as  obtained  by  modeling  noise  as  an  SaS  (Fig.  4a)  and  an 
agg  (Fig.  4b)  variable.  Results  are  shown  for 
combinations  of  the  parameters  a  and 

Some  examples  are  extracted  from  Fig.  4  when  y  is 
fixed  (Fig.  5a)  and  when  a  is  fixed  (Fig.  5b). 

Performances  are  comparable  and  prove  that  both 
approaches  are  robust  and  accurate  as  a  varies  even  in 
the  case  of  very  impulsive  noise. 

Hence,  the  agg  model  is  shown  to  be  adequate  for 
representing  the  distribution  of  a  variety  of  stochastic 
processes. 
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Notice  that  the  comparison  is  performed  under  the 
ideal  condition  of  exact  knowledge  of  the  SaS  parameters, 
while  in  real  applications  they  should  be  estimated  as  the 
agg  parameters  are 


Figure  5.  Comparison  of  the  correct  decoding  probability 
for  different  fixed  values  of  (a)  y  and  (b)  a. 

5.  Conclusions  and  future  perspectives 

The  paper  has  addressed  the  problem  of  multilevel  digital 
signal  estimation  in  the  presence  of  generic  noise.  Noise 
has  been  assumed  unimodal,  iid,  generically  non 
Gaussian,  that  is  eventually  asymmetric,  impulsive  or  not. 
The  proposed  solution  consists  in  using  a  known  estimator 
requiring  the  analytical  pdf  model  of  interference  as  a 
parameter.  At  this  aim  the  asymmetric  generalized 
Gaussian  pdf  has  been  selected,  discussed  and  compared 
with  the  SaS  distributions,  which  present  appreciable 
analytical  characteristics  and  fit  well  symmetric  impulsive 
noise.  The  selected  estimator  was  originally  applied  under 
the  assumption  of  SaS  noise  distribution,  hence  this 
model  provides  a  valuable  reference  for  validating  agg 
pdf  adequacy  in  solving  signal  estimation  problems. 

Tests  have  outlined  comparable  performances  of  the 
two  families  of  parametric  models;  anyway  the  main  agg 


advantage  yields  in  its  capability  of  representing  and  well¬ 
fitting  a  wide  set  of  possible  real  distributions  by  taking 
into  account  not  only  variable  tail  heaviness  (including 
sub-Gaussianity,  disregarded  by  the  SaS  family),  but  also 
eventual  deviation  from  symmetry. 

Future  activities  concern  the  extension  of  the  selected 
model  to  multivariate  cases,  which  is  feasible  thanks  to  its 
analytical  similarities  with  the  Gaussian  function,  and 
consequent  application  to  multidimensional  signals,  e.g. 
images. 
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Abstract 

The  alpha-stable  distribution  is  a  theoretical  model 
for  impulsive  noise  that  currently  enjoys  wide  success. 
In  this  paper,  we  test  its  applicability  to  high  resolu¬ 
tion  radars  that  are  capable  of  resolving  fine  structure 
of  the  sea  surface.  The  received  sea  clutter  signal  by 
such  systems  is  not  well  modeled  by  a  Gaussian  process, 
and  we  expected  that  stable  distributions  may  provide 
better  description  of  noise  statistics  than  the  conven¬ 
tional  non-Gaussian  models  such  as  the  K-distribution. 
However,  in  the  important  for  radar  low  probability  of 
false  alarm  region,  we  found  that  the  K-distribution 
fits  better  the  sea-clutter  amplitude  statistics  than  the 
alpha-stable  distribution.  In  the  application  considered, 
we  explain  this  failure  of  stable  model  based  on  the  ana¬ 
lytical  stable  noise  modeling. 


1  Problem  Formulation 

Although  stable  distribution  models  are  recently 
often  used  in  statistical  signal  processing,  their 
histogram-fitting  and  modeling  flexibility  in  most  cases 
are  only  analyzed  theoretically  [8],  [10].  In  this  paper, 
we  fill  this  gap  in  an  attempt  to  fit  stable  distribu¬ 
tions  to  radar  clutter  data.  We  compare  the  fit  of  the 
stable  model  to  the  real  world  data  with  that  of  the  K- 
distribution  model.  To  achieve  this,  first,  we  present 
a  computationally  efficient  method  to  evaluate  the  cu¬ 
mulative  distribution  function  (cdf)  of  stable  random 
variables  (RVs)  based  on  the  Fourier-Bessel  expansion. 
Then,  we  employ  different  sets  of  scanning  radar  data 
for  model  validation.  The  data  sets  used  are  from 
OHGR’93  IPIX  database  of  DREO.  The  detailed  de¬ 
scription  of  the  IPIX  radar  and  the  database  can  be 
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found  in  [9].  In  general,  we  had  available  the  sequences 
of  in-phase  (I)  and  quadrature  (Q)  receiver  outputs, 
which  formed  the  stationary  realization  of  the  bivariate, 
circularly  symmetric  (CS),  random  variable  (RV).  The 
objective  was  to  compare  the  fit  of  the  K-distribution 
and  the  fit  of  the  envelope  of  CS  alpha  stable  RV  to  the 
amplitude  statistics  of  the  observed  clutter. 

Our  conclusion  from  the  data  analysis  conducted  is 
that,  in  the  low  probability  of  false  alarm  region,  the 
K-distribution  fits  sea-clutter  amplitude  statistics  bet¬ 
ter  than  the  stable  distribution.  The  theoretical  explan¬ 
ation  for  that  is  in  the  physics  of  the  clutter  generating 
process.  Specifically,  the  spatio-temporal  distribution 
of  scattering  centers  contributing  to  noise  corresponds 
better  to  the  analytical  K-distributed  noise  modeling 
than  than  to  the  stable  noise  modeling  introduced  in  [4]. 

2  Preliminaries  of  Alpha-stable  Distri¬ 
butions 

The  univariate  symmetric  a-stable  {SaS)  RV  with 
**zero-mean”  is  defined  based  on  its  characteristic  func¬ 
tion  (cf)  [8]: 

=  exp(-7|wr),  (1) 

where  a  is  the  characteristic  exponent,  and  dispersion  7 
is  a  quantity  analogous  to  the  variance.  The  character¬ 
istic  exponent  controls  the  heaviness  of  the  probability 
density  function  (pdf)  tails:  a  small  positive  value  of 
a  indicates  severe  impulsiveness,  while  a  value  close 
to  2  indicates  a  more  Gaussian  type  of  behavior.  In 
this  paper,  we  are  primarily  concerned  with  the  bivari¬ 
ate  circularly  symmetric  (CS)  a-stable  random  vectors 
for  which  the  cf  is  given  as  [8] 

—  ®xp(— 7|wi  +‘*'2!*“)  =  <^a(~'7ll‘*'ll  )> 

(2) 

where  ||u>||  is  the  L2  norm  of  w. 
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The  estimation  of  parameters  for  CS  a-stable  distri¬ 
butions  is  described  in  [7]. 

A  circularly  symmetric  distribution  is  the  two  di¬ 
mensional  (2-D)  case  of  a  wider  class  of  spherically 
symmetric  distributions. 

Definition  1  An  n  x  1  random  vector  X  in  is 
said  to  have  a  spherically  symmetric  (SS)  distribution 
if  for  every  T  £  Q{n)f  FX  has  the  same  distribution 
as  X,  where  0{n)  denotes  the  set  of  n  x  n  orthogonal 
matrices: 

Vrsocn)  rx  4  X.  (3) 

Here  =  means  that  the  two  sides  have  the  same  distri¬ 
bution. 

From  Definition  1,  follows  an  important  property 
needed  to  discuss  further  SS  RVs  [5]: 

Proposition  1  An  n  x  1  RV  X.  has  a  spherical  dis¬ 
tribution  if  and  only  if  its  characteristic  function  <t>{t) 
satisfies  one  of  the  following  equivalent  conditions: 


inverse  Fourier  transform 

px(x)  =  p(|lx||)  =  (27r)“"  J exp{jx'^t)<l){t)dt,  (4) 

where  the  integration  is  over  the  whole  t  space.  The 
change  of  co-ordinates  in  (4)  from  Cartesian  to  polar 
for  SS  distributions  results  in  the  following  relation  [6]: 


where  r  =  ||x||  =  represents  the 

ordinary  Bessel  function  of  the  first  kind  of  order  k. 
The  relation  (5)  is  usually  referred  to  as  the  Hankel 
transform.  In  the  same  fashion,  it  can  be  shown  that  [6] 

^(*)  =  r  J'(n-2)/2(Ar)rfA. 

^  '  Jo  (6) 

In  applications  considered  in  this  paper,  we  are  more 
interested  in  the  pdf  P(r)  and  cdf  F{r)  of  ||X||.  One 
of  the  consequences  of  Proposition  1  is  that  the  pdf 
P(r)  of  ||X||  is  related  to  the  pdf  of  X  as 


L  ((>{Tt)  =  for  any  T  £  0{n)  ; 

2,  There  exist  a  function  V>(-)  of  a  scalar  variable 
such  that  =  V'(l|f||);  where  l|t|| 

is  the  L2  norm  oft. 

We  write  X  ^  5n(^)  to  mean  that  X  has  a  cf  of  the 
form  V^dltll),  where  ^(•)  is  a  function  of  a  scalar  vari¬ 
able,  called  the  characteristic  generator  of  the  spherical 
distribution.  For  SS  stable  distributions,  the  character¬ 
istic  generator  is  of  the  form:  =  exp(— 7|x|“). 

The  main  problem,  when  we  start  to  work  with  ca¬ 
stable  distributions,  is  that  no  closed-form  expressions 
exist  for  pdfs  and  cdfs  of  a-stable  RVs,  except  for  the 
Gaussian  (a  =  2),  Cauchy  (a  =  1)  and  Levy  {a  =  |) 
distributions.  There  are  convergent  (valid  for  a:  <<  1) 
or  asymptotic  (valid  for  a?  >>  1)  power  series  repres¬ 
entations  for  the  cdf  of  stable  distributions  [10] ,  however 
they  are  difficult  to  handle  in  numerical  calculations  be¬ 
cause  of  the  oscillating  character  of  coefficients  in  the 
series.  In  the  following  section,  we  demonstrate  how  to 
use  the  Fourier-Bessel  series  expansion  for  the  efficient 
calculation  of  the  cdf  for  the  envelope  of  bivariate  CS 
a-stable  RV. 

3  The  Fourier-Bessel  Series  in  Calcu¬ 
lating  PDF 

In  general,  the  pdf  of  n  x  1  a  random  vector  X  is 
related  to  the  characteristic  function  ^(t)  by  the  n-D 


n  n/2„n-\ 

Therefore,  from  (5)  and  (6),  we  get 

=  rum  /  ^”^V(A)  J(„-2)/2(Ar)dA 

A  Jo  (8) 

r  Ai-"/2p(A)J(„_2V2(Ar)dA. 
(2)  '  Jo  (9) 

If  ^(•)  is  known,  we  will  calculate  first  P(r)  as  in  (8) 
using  a  Fourier-Bessel  series  approximation.  Assuming 
that  P(r)  =  0  for  r  >  Y,  (8)  can  be  written  as  [2]: 


,  2-"/2+2;.n/2  ^  ^ 


Yt+ir(n/2);f^  Jl(Ck) 


(10) 


where  Ck  is  the  kth  real,  positive,  zero  of  4/^-1  (•)? 
Fk  is  the  Fourier-Bessel  coefficient 


(11) 


The  Fourier-Bessel  coefficient  is  almost  the  same  as  the 
characteristic  function  in  (9)  evaluated  at  so  we  may 
try  to  replace  Fk  with  ^(^).  Now,  integrating  in  (10) 
term-by-term,  the  cdf  of  ||X||  is 


P(r)  = 


2-n/2+2^n/2^^n/2-2^(^)^  ,,  r, 
YTr(n/2)  J|(a) 
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In  numerical  calculations,  the  series  in  (12)  is  truncated 
at  some  term  L.  Another  source  of  error  is  the  replace¬ 
ment  of  Fk  with  V’(Cfc/Y)  which  corresponds  to  aliasing 
in  calculating  FFT.  The  detailed  analysis  of  trunca¬ 
tion  errors  and  coefficient  approximations  can  be  found 
in  [2].  The  dimension  we  are  interested  in  this  paper  is: 
n  =  2.  In  this  case,  Cfe  are  not  available  in  closed-form, 
and  we  use  approximations  from  [1].  For  k  =  1,  •  •  •  ,20, 
Ck  are  calculated  in  [1],  and  the  rest  is  well  approx¬ 
imated  by  the  series  Ck  ~  -  4  ^)  +  8(n7r-7r/4)  ’ 

4  Data  Analysis 

The  cdf  of  the  K-distribution  is  F{p)  =  1  - 
Ku(bp)  where  K,,(  )  is  the  modified  Bessel 
Unction  of  the  third  kind  of  order  u.  The  model  para¬ 
meters  for  K-distributions  are  computed  from  <  >= 

(DV  and  <  >=  2(1  -f  i)  <  >2,  the  second  and 

forth  moment,  respectively  [3]. 

The  data  used  in  this  study  are  from  the  dual  po¬ 
larized  X-band  instrumentation-quality  radar  operat¬ 
ing  at  9.4  GHz.  Quadrature  demodulators  in  receiver 
channels  provide  in-phase  (I)  and  quadrature  (Q)  out¬ 
puts  for  further  signal  processing.  Each  component 
is  represent  using  8  bits.  The  radar  pulse-width  is 
200ns,  which  results  in  the  cell  resolution  of  15m.  The 
sampling  period  was  T,  =  4ms.  The  data  sets  length 
for  analysis  was  10,000  samples  from  one  range-bin. 
The  data  were  collected  at  Cape  Bonavista,  Canada, 
and  the  radar  was  located  on  a  cliff  edge  overlooking 
the  Atlantic  ocean  at  the  height  of  22  m  above  the  sea 
level.  The  environmental  conditions  were  chosen  as  to 
simulate  shipboard  radars  where  the  grazing  angle  is 

low.  . 

In  Fig.  1(a),  we  present  the  CDF  of  a  vertically 
(VV)  polarized  noise  (or  rather  its  amplitude)  based 
on  the  (i)  empirical  CDF,  (ii)  K-distributed  model 
and  (iii)  alpha-stable  model.  For  the  particular  data 
set  presented  it  is  hard  to  determine  which  model 
provides  a  better  fit  based  on  the  visual  evaluation. 
Note  however,  that  from  radar  perspective,  standard 
statistical  tests  such  as  the  chi-square  or  Kolgomorov- 
Smirnov  goodness-of-fit  tests  are  of  limited  use  for  clut¬ 
ter  data  [3].  This  is  because  in  radar,  goodness  of  fit 
is  mainly  important  in  the  low  region  of  probability  of 
false  alarm  (P/a)  i.e.,  1  -  cd/(-).  When  we  go  to  the 
tails  of  the  amplitude  distribution  as  in  Fig.  1(b),  where 
we  show  log(l  —  ^^(*))  same  data  set,  it  is  still 

hard  to  resolve  the  question  of  a  better  fit.  However, 
this  data  set  is  not  characteristic  to  the  database  avail¬ 
able.  Figure  2  compares  the  log(l  —  cdf)  for  the  rep¬ 
resentative  data  sets  from  the  OHGR  database  for  HH 
and  VV  polarized  clutter  using  the  empirical  cdf  and 


those  calculated  for  two  models  considered.  From  this 
figure,  it  is  evident  that  in  the  low  regions  of  Pja,  the 
K-distribution  provides  a  better  fit  to  the  data  than  the 
stable-distribution  which  has  tails  much  to  heavy. 

The  possible  source  of  erroneous  analysis  could  in¬ 
clude:  (i)  incorrect  parameter  estimation  and  (ii)  prob¬ 
lems  with  calculation  of  the  cdf  for  stable  distributions. 
However,  before  applying  our  tests  on  real  data  sets,  we 
tested  our  procedures  on  synthetic  data,  and  the  results 
obtained  corroborated  our  observations  for  radar  data. 

Discussion 

When  a  radar  illuminates  a  large  patch  of  the  sea,  it  is 
usually  found  that  the  pdf  of  the  envelope  of  the  back- 
scattered  echo  is  well  approximated  by  a  Rayleigh  dis¬ 
tribution.  This  is  a  consequence  of  the  Central  Limit 
Theorem  (CLT)  because  the  signal  can  be  thought  as 
the  summation  of  a  “infinite”  number  of  random  com¬ 
ponents  from  independent  scattering  centers.  However, 
for  a  narrow  beam  and  short  pulse-length  radar,  the 
conditions  (such  as  the  large  number  of  scatterers)  for 
the  CLT  to  hold  are  not  met,  and  the  Gaussian  stat¬ 
istics  are  no  longer  appropriate.  This  was  the  reason 
for  introducing  the  K-distribution  into  clutter  model¬ 
ing.  This  model  can  account  for  the  fluctuating  but 
finite  number  of  scatterers  in  the  small  area  of  sea  sur¬ 
face,  with  the  average  number  of  contributing  scatter¬ 
ers  proportional  to  the  area  of  the  sea  surface  illumin¬ 
ated  by  the  radar.  In  analytical  stable  modeling,  one 
can  account  for  the  finite  number  of  interferers,  as  well 
as  propagation  conditions,  however  one  cannot  ignore 
scattering  centers  close  to  the  receiver.  In  high  res¬ 
olution  radar,  close  to  the  receiver  scattering  centers 
do  not  contribute  to  the  interference,  and  the  backs- 
cattering  effect  does  not  comply  with  analytical  stable 
modeling  [4]  (the  size  of  the  patch  illuminated  is  much 
smaller  than  the  distance  of  this  patch  from  the  radar). 


5  Conclusion 


In  this  paper,  we  compared  the  fit  of  the  K- 
distribution  and  the  alpha-stable  distribution  to  the 
real-world  radar  data  by  performing  delicate  tests  on 
the  distributions  tail  characteristics.  Our  general  ob¬ 
servation  is  that  for  most  of  the  data  sets  available,  the 
K-distribution  provides  better  fit  to  the  experimental 
data  than  the  alpha-stable  distribution.  We  provide 
possible  theoretical  explanation  for  this  behavior.  As 
an  additional  result,  we  presented  a  computationally 
efficient  method  to  calculated  the  cdf  of  stable  distri¬ 
butions. 
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a)  Tail  of  CDF  for  VV  polarized  clutter 

Figure  2  Representative  tails  of  CDF 


References 

[1]  M.  Abramowitz  and  I.  Stegun.  Handbook  of  Mathem¬ 
atical  Functions.  National  Bureau  of  Standards,  Wash¬ 
ington,  1972. 

[2]  J.  Bird.  Error  performance  of  binary  NCFSK  in 
the  presence  of  multiple  tone  interference  and  system 
noise.  IEEE  Trans.  Commun.,  COM  33,  No.3:203- 
209,  March  1985. 

[3]  H.  Chan.  Radar  sea-clutter  at  low  grazing  angles.  lEE 
Proceedings,  Pt.F,  Vol.137,  No.2:102-112,  April  1990. 

[4]  J.  Ilow.  Signal  Processing  in  Alpha- Stable  Noise 

Environments:  Noise  Modeling,  Detection  and  Es¬ 
timation.  Ph.D.  thesis,  University  of  Toronto,  URL: 
http: //w  ww.comm.toronto.edu/  ilow /ilow. html, 

Toronto,  1996. 

[5]  K.  Fang,  S.  Kotz  and  Ng  Kai.  Symmetric  Multivariate 
and  Related  Distributions.  Chapman  and  Hall,  Lon¬ 
don,  1990. 


b)  Tail  of  CDF  for  HH  polarized  clutter 
for  the  OHGR  database  clutter. 


[6]  R.  Lord.  The  use  of  the  Hankel  transform  in  statistics. 
Biometrica,  41:44-55,  1954. 

[7]  X.  Ma  and  C.  Nikias.  Parameter  estimation  and 
blind  channel  identification  in  impulsive  signal  en¬ 
vironments.  IEEE  Trans.  Signal  Processing,  43, 
No.l2:2884-2897,  Dec.  1995. 

[8]  C.  Nikias  and  M.Shao.  Signal  Processing  with  Alpha- 
Stable  Distributions  and  Applications.  John  Wiley  Sz 
Sons,  Inc.,  New  York,  1995. 

[9]  T.  Nohara  and  S.  Haykin.  Growler  detection  in  sea 
clutter  using  Gaussin  spectrum  models.  lEE  Proceed¬ 
ings,  Pt.F,  Vol.141,  No.5:285-291,  Oct.  1994. 

[10]  G.  Tsihrintzis  and  C.  Nikieis.  Performance  of  optimum 
and  suboptimum  receivers  in  the  presence  of  impulsive 
noise  modeled  as  an  alpha-stable  process.  IEEE  Trans. 
Commun.,  43,  No. 2/3/4:904-914,  Mcirch  1995. 


267 


On  the  Modeling  of  Network  Traffic  and  Fast  Simulation  of  Rare 
Events  using  o-Stable  Self-Similar  Processes 


Anestis  Karasaridis  and  Dimitries  Hatzinakos 
University  of  Toronto 

Dept,  of  Electrical  and  Computer  Engineering 
Toronto,  Ont.  M5S  3G4,  Canada 

email:  {anestis, diinitris}@comm.toronto.edu 


Abstract 

We  present  a  new  model  for  aggregated  Network  traf¬ 
fic  based  on  a-Stable  Self-Similar  processes  which  cap¬ 
tures  the  burstiness  and  the  Long  Range  Dependence  of 
the  data.  We  show  how  the  Fractional  Gaussian  noise 
assumption  fails  and  why  our  proposed  model  fits  well 
by  comparing  real  and  synthesized  network  traffic.  In 
addition,  we  show  that  we  can  speed  up  the  simulation 
times  for  estimation  of  rare  event  probabilities,  such 
as  cell  losses  in  ATM  switches,  by  up  to  three  orders 
of  magnitude  using  ct-Stable  modeling  and  Importance 
Sampling. 


1  Introduction 

In  the  area  of  Telecommunication  networks,  recent 
high-quality,  high-resolution  extensive  measurements 
of  Local  Area  Network  traffic  [7],  Variable-Bit- Rate 
compressed  video  streams[2].  Wide  Area  Network  traf¬ 
fic  [12]  and  Web  client-server  traffic  [3],  have  shown 
that  the  aggregated  traffic  in  networks  does  not  ex¬ 
hibit  Poisson  characteristics  but  rather  it  follows  sim¬ 
ilar  statistics  for  a  wide  range  of  time-scales.  The  as¬ 
sumption  that  in  general  the  traffic  is  a  Self-Similar 
process  (SSP)  can  explain  the  existence  of  Long  Range 
Dependence  (LRD)  (i.e.  slowly  decaying  correlation) 
and  the  persistent  burstiness  regardless  of  the  time- 
scale. 

One  of  the  main  implications  of  the  above  observa¬ 
tions  in  Network  engineering  is  that  the  aggregation 
of  traffic  in  statistical  multiplexers  and  ATM  switches 
does  not  smooth  out  the  overall  traffic  if  the  individ¬ 
ual  sources  are  bursty.  Similarly,  deterministic  traffic 
smoothers  have  objectionable  value.  Also,  according  to 
new  results  in  buffer  occupancy  distributions  for  Self- 


Similar  inputs,  the  buffer  cell  loss  probability  decreases 
with  the  buffer  size  only  algebraically  in  contrast  to 
Markovian  models  where  the  decrease  is  exponential. 
The  average  cell  delay  always  increases  with  the  buffer 
size  whereas  in  Markovian  models  it  does  not  exceed  a 
certain  limit  regardless  of  the  buffer  size  [8], [10]. 

Since  the  main  properties  of  the  high  speed  network 
traffic  are  high  variability  and  self-similarity,  two  in¬ 
dependent  modeling  approaches  emerged  among  oth¬ 
ers.  The  first  is  the  use  of  heavy-tailed  distributions 
(e.g.  Pareto)  to  account  for  the  high- variability  [7], 
and  the  other  is  the  use  of  Self-Similar  processes  (e.g. 
Fractional  Gaussian  noise)  to  account  for  the  statistical 
self-similarity  of  the  data  [11].  While  these  approaches 
give  better  results  than  the  simple  Poisson  or  Com¬ 
pound  Poisson  models,  they  fail  to  unify  the  desired 
model  properties. 

On  the  other  hand,  a-Stable  SSP’s  can  capture  both 
the  high  variability  (burstiness),  since  the  underly¬ 
ing  distribution  is  heavy-tailed,  and  the  self-similarity. 
Furthermore,  they  provide  a  physical  interpretation  on 
how  the  observed  data  appear  as  the  superposition  of 
independent  effects  according  to  the  Generalized  Cen¬ 
tral  Limit  Theorem. 

This  paper  addresses  two  problems:  a)  Network 
Traffic  modeling  using  o;-Stable  Self-Similar  models 
and  b)  Fast  simulation  of  Rare  Events  based  on  a- 
Stable  modeling.  A  new  simple  and  parsimonious 
model  is  proposed  that  provides  better  connection  be¬ 
tween  the  model  parameters  and  the  physical  genera¬ 
tion  of  Self-Similar  processes.  The  validity  of  the  pro¬ 
posed  model  is  tested  by  fitting  real  network  traffic 
data  obtained  from  the  ftp  site  of  Bellcore.  It  is  also 
shown  that  the  proposed  model  is  a  powerful  tool  in  de¬ 
riving  new  fast  simulation  methods  of  rare  events  (such 
as  cell  losses  in  ATM  switches)  in  network  traffic. 
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2  Network  traffic  modeling 


The  expected  value  of  the  model  is 


As  a  measure  of  network  traffic  we  use  the  packet 
counts  (number  of  packets  arrived/time  unit)  or  the 
interarrival  times  of  the  packets.  Real  data  were  ob¬ 
tained  from  Bellcore’s  ftp  site  (ftp.bellcore.com)  and 
the  modeling  was  applied  to  external  traffic  data  (file 
OctExt.TL)  which  give  the  time-stamps  and  length  of 
Ethernet  packets  through  the  main  gateway.  The  ex¬ 
ternal  data  show  more  variability  than  the  internal  and 
to  the  authors’  knowledge  have  not  yet  been  modeled 
successfully. 

Our  proposed  model  is  based  on  the  well  balanced 
Linear  Fractional  Stable  Noise  (LFSN)  which  is  defined 
as  follows  [13]: 


/oo 

(|i+l  _  |i_a:|"-i/«)Ms(dx), 

•OO 


(1) 

where  Ms  (da:)  is  an  a-Stable  random  measure  with 
Lebesgue  control  measure,  H  is  the  Hurst  or  self¬ 
similarity  parameter^  a  is  the  characteristic  exponent 
of  the  a-Stable  distribution  and  S  is  the  vector  of  the 
a-Stable  distribution  parameters  (a,  tr,^, /i).  It  is  a 
stationary  process  with  LRD  if  i/  >  1/a  and  it  is  a 
generalization  of  the  Fractional  Gaussian  noise  which 
is  obtained  by  replacing  a  with  2.  The  above  integral 
(1)  in  discrete  form  can  be  approximated  by  the  con¬ 
volution  of  the  linear  filter  hd{i)  =  \i  +  1|^  ~  1*1^)  d  = 
i/  —  1/a,  i  E  Z  with  iid  a-Stable  random  variables 
denoted  by  Sa,cr,/3,/i(0‘ 


Lcc,<y,pjH^H{i)  —  hd{i)  ♦  i5a,a,/?,Ai(0'  (^) 


The  proposed  a-Stable  Self-Similar  model  is  linear 
and  has  the  following  expression: 

M{i)  =  Cl  •  ia,H(0  +  C2  (3) 

=  Cl  •  hd{i)  *  5'a,l,l,o(0  C2,  Ci,C2  E  y 

where  ci  and  C2  are  positive  real  constants  and  La,H  (0 
is  Linear  Fractional  Stable  Noise  (LFSN)  with  (3  = 
1,  <7  =  1  and  =  0  and  i/  >  1/a  to  ensure  long  range 
dependence. 

Since  ^  =  1  the  LFSN  process  is  totally  skewed.  This 
does  not  imply  that  the  density  function  has  support 
only  on  the  positive  X  axis  for  all  a’s.  It  is  strictly 
positive  only  for  a  <  1  but  this  condition  is  very  re¬ 
strictive  for  our  modeling  since  we  impose  the  inequal¬ 
ity  a  >  1/JT,  where  0  <  if  <  1.  Also  the  condition 
that  a  is  greater  than  1,  ensures  that  the  mean  of  the 
LFSN  exists,  according  to  the  properties  of  a-Stable 
distributions. 


E[M(i)]  =  ci-^A(fc)-E[5«,i,i,o(i-*)]+C2  =C2,  (4) 

k 

since  E[5«,i,i,o(i)]  =  fi  =  0.  The  positive  constant  ci 
is  equivalent  to  the  scaling  parameter  cr  of  the  a-Stable 
distribution  S.  In  other  words  the  model  in  (4)  is  equiv¬ 
alent  to  the  following:  M{i)  =  hd{i)  *  5a,ci,i,o(*)  +  C2« 
The  Characteristic  Function  (CF)  of  ^  1 

is: 


^s(t^)  =  exp  [-|a;|^(l  - isgn(a;)  tan(7ra/2))],  (5) 

and  from  (4),  LFSN’s  CF  becomes 
ln$L(w)  =  -|c.>|"-^|/irf(i)|"(l-isgn(w)tan(7ra;/2)). 

(6) 

As  a  result  the  CF  of  the  model  process  M  is: 

ln$M(w)  =  -|ci  • 

i 

(1  -  isgn(a?)  tan(7ra/2))  +  jc^oo.  (7) 

2.1  Model  parameter  estimation 

The  suggested  parametric  model  is  quite  parsimo¬ 
nious  since  it  depends  on  a  set  of  only  four  parameters: 
(/f,  a,ci,  C2).  The  self-similarity  parameter  H  is  esti¬ 
mated  first,  by  using  the  R/S  statistic  [9],[1].  Estimates 
of  the  R/S  statistic  were  calculated  for  both  the  packet- 
count  and  the  interarrival-time  processes  and  show  that 
they  are  long  range  dependent  since  H  >  0.5  in  both 
cases  (if  =  0.85  for  the  packet  counts  and  H  =  0.78 
for  the  interarrival  times). 

Parameter  C2  is  calculated  as  the  mean  of  the  mod¬ 
eled  process  as  we  saw  in  (4).  In  practice  C2  has  to  be 
adjusted  slightly  since  the  generator  of  LFSN  according 
to  (4)  might  give  a  small  non  zero  average. 

The  other  parameters  a  and  ci  can  be  calculated  by 
minimizing  the  mean  absolute  error  between  the  model 
and  the  real  data: 

min  E|X  -  Cl  •  La  H  ~  C2I,  (8) 

l/fr<a<2,Ci>0 

where  X  is  the  vector  of  the  real  data  corresponding 
to  either  the  packet-count  or  the  interarrival-time  pro¬ 
cess,  The  existence  of  the  mean  absolute  error  is  guar¬ 
anteed  since  a  >  1  for  the  permissible  range  of  ff ,  as 
mentioned  above.  Results  from  simulations  using  the 
above  procedure  for  parameter  estimation  are  shown  in 
figure  1  where  we  see  that  the  synthesized  traffic  ap¬ 
proximates  very  well  the  real  one.  We  also  note  that 
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to  different  sets  of  parameters  depending  on  the  initial 
estimates. 

In  the  following  section  we  provide  a  link  between 
the  use  of  a-Stable  distributions  for  network  traffic 
modeling  and  the  feasibility  of  fast  simulations  of  net¬ 
work  traffic  for  the  estimation  of  the  probability  of  rare 
events  such  as  cell  losses  or  excessive  delays. 

3  Fast  simulation  of  networks  using 
a-Stable  modeling  and  Importance 
Sampling 


synth»ste»<3  trarric.  «-i  .33.  ci-100.  02-30 
eooj - —I - —t - 1 - '  ' 

500  >• 

^  400  -  I 


£ 

£-300 


(b) 


Figure  1 .  (a)  Actual  traffic  (packet  counts)  with 
scaling  lOsec,  and  (b)  Synthesized  traffic  us¬ 
ing  the  new  model  proposed  (see  eq.  4)  for 
a  =  1.63,  Cl  =  100  and  C2  =  39.  The  model  pa¬ 
rameters  were  estimated  using  the  minimum 
absolute  error  approach  (see  eq.  8). 


the  estimated  value  a  is  far  from  2  which  means  that 
the  model  is  far  from  being  Gaussian.  It  was  suggested 
also  in  [11]  that  this  type  of  traffic  cannot  be  modeled 
using  Fractional  Gaussian  Noise. 

Another  method  to  estimate  the  parameters  a  and 
Cl  is  by  using  the  expression  of  the  Characteristic  Func¬ 
tion  of  the  suggested  model.  An  unbiased  and  consis¬ 
tent  estimate  of  the  CF  can  be  obtained  by  employing 
the  Sample  Characteristic  Function  (SCF)  [6].  From 
(7)  it  follows  that: 

ln(— In  |$jvf(w)|)  =  Q;-ln  |w|-|-ln  Ci'-t-ln  |h(i(i)|  (9) 

i 

which  can  be  solved  for  a  and  cj  using  linear  regression 
over  uniformly  spaced  frequencies  [6]. 

The  last  method  which  uses  the  SCF  is  faster  over 
the  first  one  that  uses  the  average  absolute  error  but 
it  is  very  sensitive  to  estimation  errors  of  the  CF,  es¬ 
pecially  when  the  scale  ci  of  the  real  data  is  large. 
On  the  other  hand,  the  first  method  might  converge 


In  Network  performance  evaluation  we  are  inter¬ 
ested  in  estimating  the  probabilities  of  unwanted  rare 
events  that  compromise  the  quality  of  service  for  users. 
Some  of  those  unwanted  events  are  packet  losses,  ex¬ 
cessive  waiting  times,  or  component  failures  in  systems. 
We  need  to  know  how  the  probabilities  of  such  events 
change  from  node  to  node  and  also  how  they  affect  the 
end-to-end  performance  of  the  system.  One  disadvan¬ 
tage  of  moving  away  from  traditional  Poisson  model¬ 
ing  is  that  analytical  results  for  delays  and  overflow 
probabilities  become  very  hard  to  compute,  especially 
for  non-trivial  topologies  with  networks  of  queues  and 
complex  admission  and  routing  procedures.  In  such 
cases,  simulation  is  the  only  viable  solution  to  charac¬ 
terize  the  network  traffic  locally,  as  well  as  end-to-end. 

Suppose  now  that  we  need  to  simulate  an  ATM 
switch  and  verify  that  under  normal  conditions  the 
probability  of  cell  loss  remains  close  to  a  typical  value 
of  10"®.  Therefore,  we  need  at  least  10®,  or  typically 
two  orders  of  magnitude  more  packets  (to  reduce  the 
variance  of  the  estimation),  to  go  through  the  switch 
during  the  simulation.  Thus,  there  will  be  a  need  for 
generation  of  very  long  data  records  and  many  repe¬ 
titions  of  the  experiment  which  will  require  very  long 
simulation  times. 

Importance  Sampling  (IS)  is  a  method  to  reduce 
simulation  times  by  infusing  an  increased  number  of 
rare  events  into  the  system  and  unbiasing  the  output 
by  multiplying  it  by  a  likelihood  function.  In  the  fol¬ 
lowing,  we  will  give  a  short  introduction  to  IS  and  show 
how  a-Stable  modeling  can  be  used  in  this  framework 
to  build  fast  simulators. 

Suppose  that  the  probability  of  a  rare  event  Pre  is 
equal  to  the  probability  that  the  random  variable  X, 
with  density  function  p{x)^  is  in  the  set  A,  P(X  G  A): 

Pre  =  -PCX  eA)=f  1(X64)P(®)  dx  =  F;p[l(X€4)] 

(102 

The  idea  behind  IS  is  to  replace  p(x)  by  = 

L{x)g{x)  in  order  to  infuse  more  events  (X  G  A)  into 


270 


the  system  by  using  a  different  density  function  g{x) 
instead  ofp(a:).  From  (10)  we  get: 

Pre  =  J  l(x.eA)^^9{x)  dx  =  £'g[l(xeyi)i(a:)]  (11) 

An  unbiased  estimate  of  the  probability  of  the  rare 
event  is  given  by  taking  N  samples  generated  by  the 
PDF  flf(x),  (Xi, . . . ,  Xjv)  and  then  using  the  likelihood 
function  L{x)  to  unbiase  the  estimate: 

1  ^ 

“  jv  ^  l(X„6^)-^’(Xn)  (12) 

n=l 

If  p{x)  is  a-Stable  with  characteristic  exponent  ai  and 
the  rare  event  is  in  A  =  {X  :  X  >  where  X  is 
the  queue  length  and  Xo  is  the  buffer  size,  then  g{x) 
is  chosen  as  a-Stable  with  characteristic  exponent  a2, 
where  a2  <  ai.  Then  the  probability  of  a  rare  event 
P(X  >  Xo)  increases,  since  for  a  general  a-Stable  dis¬ 
tribution  the  tails  behave  as  follows: 

^lim^  P(X  >  Xo)  =  0.5Ca(l  -f-  (13) 

where  Ca  is  a  constant. 

In  the  following  section  we  provide  results  of  simu¬ 
lations  that  show  how  accurate  the  estimator  is  over 
different  simulation-time  gains,  and  how  its  variance  is 
affected  by  the  choice  of  the  twisting  parameter  a2- 

3.1  Simulations 

We  assume  that  the  input  process  follows  a  symmet¬ 
ric  a-Stable  distribution  (SaS)  with  ai  =  1.8  and  that 
the  rare  event  is  A  =  {X  :  X  >  a^o},  where  Xo  is  se¬ 
lected  from  fractile  tables  so  that  Pre  —  P(X  >  Xo)  = 
10^4  YoT  the  above  parameters  Pre  and  ai,  we  find 
that  xq  =  44.28.  The  reason  that  we  chose  to  simulate 
a  regular  o-Stable  process  rather  than  a  Self-Similar 
a-Stable  is  that  there  exist  very  detailed  fractile  tables 
and  also  analytical  expressions  that  provide  very  accu¬ 
rate  calculation  of  the  Pre  of  our  choice  and  a  basis 
for  comparison  with  the  simulation  Jesuits. 

Since  there  is  no  closed  form  PDF  for  a-Stable  dis¬ 
tributions  except  for  a  few  special  cases,  we  used  nu¬ 
merical  computation  by  Fourier  inversion  of  the  general 
form  of  the  Characteristic  Function. 

We  carried  out  simulations  for  different  numbers  of 
samples  N  —  10^,  10^,  5  •  10^  and  TV  =  10“*  for  a  range 
of  a2,  where  we  calculated  the  average  estimated  Pre 
and  it’s  variance  for  10  iterations  of  the  same  experi¬ 
ment.  Note  that  if  Pre  is  given  by  (10)  and  is  esti- 
mated  by  simulation,  Pre  —  where  J„ 

is  the  indicator  function  of  the  set  A.  Then  for  a  10% 
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Figure  2,  Importance  Sampling  fast  simula¬ 
tion  for  sample  size  N=1000.  (a)  Estimated 
Pre  and,  (b)  Variance  of  estimated  Pre-  We 
see  that  as  the  twisting  parameter  a2  is  de¬ 
creased  the  estimates  become  more  accurate 
and  the  variance  decreases. 


error  in  the  estimation  with  99%  confidence,  we  would 
need  N  =  6.635  x  10®  samples,  by  the  Central  Limit 
Theorem.  Figures  2  and  3  show  the  results  of  the  sim¬ 
ulations  for  N  =  1000  and  N  =  5000  respectively.  We 
see  that  the  estimates  approach  the  theoretical  value 
of  10““*  as  the  twisting  parameter  a2  becomes  smaller 
and  also  that  the  variance  of  the  estimates  decreases. 
Even  though  the  sample  size  is  three  orders  of  mag¬ 
nitude  smaller  than  the  one  required  by  the  Central 
Limit  Theorem  without  Importance  Sampling,  the  er¬ 
ror  of  the  estimate  is  less  than  10%  for  all  a2  <  1.1 
with  N  =  1000  and  less  than  5%  with  N=5000. 

4  Conclusions 

We  showed  that  a-Stable  Self-Similar  models  can 
be  a  very  effective  tool  in  modeling  future  high-speed 
network  traffic.  In  addition,  they  can  provide  the  foun¬ 
dation  for  fast  network  simulators  for  effective  network 
evaluation  and  management. 
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Figure  3.  Importance  Sampling  fast  simula¬ 
tion  for  sample  size  N=5000.  (a)  Estimated 
Pre  and,  (b)  Variance  of  estimated  Pre- 


[7]  Leland  W.E.,  Taqqu  M.S.,  Willinger  W.,  Wil¬ 
son  D.V.,  “On  the  Self-Similar  Nature  of  Ether¬ 
net  Traffic  (Extended  Version)”,  IEEE  Trans.  Net¬ 
working,  Vol.  2,  No.  1,  Febr.  1994. 

[8]  Likhanov  N.,  Tsybakov  B.,  Georganas  N.D., 
“Analysis  of  an  ATM  Buffer  with  Self-Similar 
(Fractal)  Input  Traffic”,  Proc.  INFOCOM  95, 
Boston,  MA. 

[9]  Mandelbrot  B.B.,  Van  Ness  H.  W.,  “Fractional 
Brownian  Motions,  Fractional  Noises  and  Appli¬ 
cations”,  SIAM  Rev.,  Vol.  10,  pp.  422-436,  Oct. 
1968. 

[10]  Norros  I,  “A  storage  model  with  self-similar  in¬ 
put”,  Queueing  Systems,  16(1994),  pp.  387-396. 

[11]  Norros  I.,  “On  the  Use  of  Fractional  Brownian  Mo¬ 
tion  in  the  Theory  of  Connectionless  Networks”, 
IEEE  Sel.  Areas  in  Comm.,  Vol.  13,  No.  6,  Aug. 
1995. 

[12]  Paxson  V.,  Floyd  S.,  “Wide-Area  Traffic:  The 
Failure  of  the  Poisson  Modeling”,  IEEE/ ACM 
Trans.  Networking,  3(3),  pp.  226-244,  June  1995. 

[13]  Samorodnitsky  G.,  Taqqu  M.S.,  Stable  Non- 
Gaussian  Random  Processes,  Chapman  &  Hall, 
1994. 


References 

[1]  Beran  J.,  Statistics  for  Long-Memory  Processes, 
Chapman  fe  Hall  1994. 

[2]  Beran  J.,  Sherman  R.,  Taqqu  M.S.,  Willinger 
W.,  “Long-Range  Dependence  in  Variable-Bit- 
Rate  Video  Traffic”,  IEEE  Trans.  Comm.,  Vol.  43, 
No.  2/3/4,  Feb/Mar/Apr  1995. 

[3]  Crovella  M.E.,  Bestavros  A.,  “Explaining  World 
Wide  Web  Traffic  Self-Similarity”,  Technical  Re¬ 
port  TR-95-015,  Boston  University,  1995. 

[4]  Heidelberger  P.,  “Fast  Simulation  of  Rare  Events 
in  Queuing  and  Reliability  Models”,  ACM  Trans, 
on  Model,  and  Comp.  Simula.,  Vol.  5,  No.  1,  Jan. 
1995. 

[5]  Kogon  S.M.,  Manolakis  D.G.,  “Signal  Modeling 
with  Self-Similar  a-Stable  Processes:  The  Frac¬ 
tional  Levy  Stable  Motion  Model”,  IEEE  Trans. 
Sig.  Proc.,  Vol.  44,  No.  4,  April  1996. 

[6]  Koutrouvelis  I.A.,  “Regression-Type  Estimation 
of  the  Parameters  of  Stable  Laws”,  J.  Amer.  Stat. 
Assoc.,  Vol.  75,  No.  372,  Dec.  1980. 


272 


TDE,  DOA  AND  RELATED  PARAMETER  ESTIMATION  PROBLEMS  IN 

IMPULSIVE  NOISE 


Ananthram  Swami  Brian  Sadler 

Army  Research  Lab 

AMSRL-IS-TA,  2800  Powder  Mill  Road,  Adelphi,  MD  20783-1197,  USA 
e-mail:  (aswami,bsadler)@emh3.arLmil 


ABSTRACT 

We  address  the  problem  of  time-delay  estimation  (TDE) 
and  direction- of- arrival  (DOA)  estimation  in  the  presence 
of  symmetric  alpha-stable  noise.  We  show  that  these  prob¬ 
lems  can  be  handled  by  conventional  correlation  or  cumu- 
lant  based  techniques,  provided  that  the  noisy  signals  are 
first  passed  through  a  generic  zero-memory  non-linearity. 
This  pre-processing  is  also  useful  in  the  detection  con¬ 
text.  We  also  address  the  problem  of  blind  linear  sys¬ 
tem  identification,  where  the  input  is  an  iid  alpha- stable 
process;  we  show  that  consistent  estimates  of  the  possibly 
non-minimum  phase  ARMA  parameters  can  be  obtained  by 
using  self-normalized  correlations  and  cumulants.  Theoret¬ 
ical  arguments  are  supported  by  simulations. 


1  INTRODUCTION 

In  typical  signal  processing  applications,  additive  noise  can 
often  be  well  modeled  as  the  sum  of  a  nominal  stationary 
Gaussian  component  (thermal  noise,  etc)  and  high  ampli¬ 
tude  non-stationary,  non-Gaussian  (NG)  components  [16]. 
The  NG  component  may  be  modeled  as  the  output  of  a  lin¬ 
ear  filter  excited  by  an  inhomogeneous  Poisson  process  or 
random  high- amplitude  transients  [16,  pp  184-196].  In  the 
stationary  setting,  the  NG  component  can  be  modeled  by 
a  filtered  Bernoulli- Gaussian  product  process  [9],  or  as  the 
realization  of  a  Gaussian  mixture  process,  which  are  special 
cases  of  the  Middleton  model  [16,  pp  137-215]. 

More  recently,  a  lot  of  attention  has  been  given  to  the 
stationary  symmetric  alpha  stable  (SaS)  process.  The  char¬ 
acteristic  function  of  the  SaS  random  variable  (rv)  is  given 
by  [8,  10],  E{exp{jvx)}  =  exp(-7|u|“),  0  <  a  <  2,  7  >  0. 
Parameter  a  is  the  index  or  chareicteristic  exponent,  7  is 
the  dispersion,  and  C  :=  is  called  the  scale  parameter. 
Apart  from  the  lack  of  a  closed- form  pdf  (a  ^  1,  a  ^  2),  the 
SaS  rv  possesses  various  interesting  properties  such  as,  tail 
probabilities  of  order  and  E\x\^  =  00,  p  >  a,  a  <  2, 
etc  [10,  8].  Note  that  sample  estimates  of  \x\^  are  consistent 
only  for  -1/2  <  p  <  a/2. 

Because  of  the  ‘infinite  variance’  of* the  SaS  rv,  the  co¬ 
variation, 

C.{x,y)  :=  ^yE{X\Yr'-Y*}/E\Y\^,p  <a<2  (1) 

and  the  covariation  coefficient,  Xx,y  :=  Coc{x,y)/yy,  have 
been  used  instead  of  the  correlation  in  various  applications 
[5].  The  covariation  is  defined  only  if  both  X  and  Y  are 
SaS  with  the  same  a.  In  [13],  we  have  shown  that  esti¬ 
mates  of  the  normalized  correlation  show  much  less  vari¬ 
ability  than  those  of  the  covariation.  Motivated  by  [1], 
we  used  normalized  correlations  to  estimate  the  ‘spectrally’ 
equivalent  minimum-phase  (SEMP)  parameters  of  a  linear 
SaS  process.  Here,  we  will  define  and  use  normalized  cu¬ 
mulants  to  estimate  parameters  of  mixed-phase  models. 


A  classical  approach  to  detection  in  impulsive  noise  is  to 
use  a  locally  optimum  detector  (LOD);  but  this  requires 
knowledge  of  the  noise  pdf.  An  alternative  is  to  use  a  zero- 
memory  non-linearity  (ZMNL)  which  captures  the  LOD  be¬ 
havior  for  a  class  of  pdfs.  Pre-processing  the  data  by  us¬ 
ing  a  signed  fractional  power,  such  as  in  the  covariation, 
is  equivalent  to  using  a  particular  ZMNL,  but  is  not  opti¬ 
mal  in  any  sense.  We  use  ZMNL  pre-processing  followed  by 
correlation-based  processors  for  both  detection  and  estima¬ 
tion.  We  demonstrate  its  use  in  the  context  of  harmonic 
retrieval,  direction-of-arrival  (DOA)  estimation,  and  time- 
delay  estimation  (TDE). 

2  LINEAR  SaS  PROCESSES 

We  address  the  problem  of  estimating  the  ARMA  parame¬ 
ters  of  the  process, 

P  q 

y(n)  =  -  ^  (i{k)y(n  -  k) b{k)u{n  -  k),  (2) 

k=l  k=0 

where  u{n)  is  iid  SaS.  Impulse  response  (IR),  h{n),  is 
assumed  to  satisfy  the  a-summability  condition:  36  G 
(0,  a)  n  (0, 1)  St  \h{m)f  <  oo. 

Hannan  [2]  showed  that  the  correlation-based  Yule- 
Walker  equations  lead  to  consistent  estimates  for  the  pure 
AR  case.  Davis  and  Resnick  [1]  showed  that  the  sample  es¬ 
timates  of  the  normalized  correlation  of  a  linear  SaS  process 
are  consistent, 

^  N  N 

Rxx{m)  :=  '^x{n)x*{n  +  m)  /  ^|a:(n)|^  (3) 

n=l  n=l 

oo  oo 

-^Rhh{m)  ^  h{j)h{j  +  m)/  ^  (4) 

j=-oo  j=~oo 

where  the  convergence  is  in  probability.  Mikosch  et  al  [7], 
extending  the  results  of  [1],  showed  that  the  sample  esti¬ 
mate  of  the  self-normafized  periodogram  is  consistent,  and 
that  Whittle-type  estimators  based  on  it  lead  to  consistent 
estimates  of  the  ARMA  parameters.  All  of  these  papers 
also  established  rates  of  convergence,  of  the  form 
The  ML  estimator  for  a  MA(1)  Cauchy  process  was  derived 
in  [15].  The  a-spectrum  was  introduced  by  Ma-Nikias  [4] 
who  showed  that  it  retains  both  amplitude  and  phase  infor¬ 
mation,  thus  establishing  that  non-minimum  phase  linear 
systems  can,  in  principle,  be  blindly  estimated.  However, 
consistency  of  the  sample  estimate  of  the  a-spectrum,  or 
of  the  resulting  (truncated)  cepstral  coefficients  was  not  es¬ 
tablished. 

Motivated  by  the  results  of  [1,  7],  we  proposed  to  use 
correlation-based  algorithms  to  estimate  the  SEMP  model 
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parameters  in  [13];  of  course,  inherent  all-p^s  factors  can¬ 
not  be  identified  via  this  approach.  We  define  normalized 
fourth-order  moments  via 


.A^4x('7”1)  ^25  "^3)  • 


x{n)x{n  -h  Ti)a;(n  +  r2)x{n  +  ra) 


Normalized  moments  of  other  orders  (and  conjugations  in 
the  complex  case)  can  be  defined  similar  y.  Fourth-order 
cumulants  can  now  be  defined  in  the  usual  way, 

C4x(ri,T2,T3)  =  M4x(Tl,r2,T3)  -  ^x(ti)^xx(T2  -  T3) 

-^x{T2)^t{T3  -  Tl)  -  Rxx{Ts)^x{n  “  T2)  (6) 

where  Rxx  was  defined  in  (3).  We  claim  that 

h{n)h{n  +  Ti)h{n  +  T2)h{n  +  ts) 

C'4x(ti,T2,T3)  —2^  K  (  E„ 

where  k  is  a  constant.  Self-normalized  polyspectra  can  then 
be  defined  as  the  Fourier  transforms  of  the  cumulant  se¬ 
quences.  Rates  of  convergence  and  limiting  distributions 
can  presumably  be  established  along  the  fines  of  [1). 

Having  estimated  normalized  correlations  and  fourth- 
order  cumulants,  we  can  use  various  algorithms  to  resolve 
the  true  zeros  of  the  linear  model  via  cumulants  [6J.  We 
note  that  our  approach  holds  for  0  <  a  <  2,  not  just  for 

Example  1.  We  simulated  a  MA(3)  Procfss  for  two  diffei- 
ent  values  of  a;  additive  white  Gaussian  noise  (AWGN)  with 
variance  u?  was  added  to  the  linear  process.  We  estimated 
the  normalized  autocorrelation  from  JV  =  2000  samples;  we 
fitted  a  long  AR  model  (order  27)  and  then  a  MA  model 
to  it.  Since  SOS  are  used,  the  algorithm  yields  estiin^ates 
of  the  SEMP  model.  Mean  and  standard  deviations  ot  the 
estimates,  averaged  over  100  trials,  are  given  in  Table  1, 
Notice  that  when  a  is  small,  the  additive  noise  has  very 
little  effect.  In  the  noisefree  case,  the  estimates  are  unbi¬ 
ased  and  have  very  low  variance.  As  a  becomes  larger,  the 
signal  becomes  more  Gaussian,  and  it  is  harder  to  suppress 
the  effects  of  the  AWGN  with  shorter  data  lengthy  The 
smaller  the  value  of  a,  the  more  spiky  are  the  data.  1  he  nor¬ 
malization  suppresses  all  but  the  strongest  impulses;  th^ 
the  ‘normalized  data’,  consists  of  relatively  well-separated 
impulse  responses,  which  facilitates  the  ^timation  ot  the 
model  parameters.  The  normalization  also  helps  to  sup¬ 
press  the  effects  of  additive  finite  variance  noise. 

In  order  to  resolve  the  true  zeros  of  the  model,  we  esti¬ 
mated  the  normalized  fourth-order  cumulants  in  (6),  and 
then  used  them  in  the  ‘GM  RC’  least-squares  algorithm  ^j. 
Estimates  of  the  mixed  phase  model  are  shown  m  Table 
2.  Note  the  good  performance  of  the  estimator.  The  vari¬ 
ance  of  the  estimate  increases  as  a  increases;  this  is  because 
the  rates  of  convergence  of  the  moment  estimates  is  oi  or¬ 
der  i.e.,  for  a  fixed  AT,  we  expect  better  performance 

when  a  is  smaller  (more  non-Gaussian) .  We  note  that  the 
non-parametric  a-spectrum  approach  of  [4]  required  over 

5  X  10^  samples  to  yield  good  estimates. 

Example  2.  We  simulated  an  AR(4)  process,  with  two 
different  values  of  a,  and  W  =  2000  The  AR  Parameters 
were  estimated  from  the  (normalized)  correlation  (20  lags) 
via  the  Yule- Walker  equations.  Means  and  standard  devia¬ 
tions  of  the  parameter  estimates,  averaged  over  100  trials, 
are  reported  in  Table  3.  Notice  the  good  performance  of 
the  estimates,  as  expected  from  the  results  of  [2J . 
Example  3,  The  observed  signal  was  y{t)  —  u{t)/Ai(z)  + 
g{t)/An{z)y  where  signal  u(t)  was  SaS  with  a  =  1,7  —  1; 
noise  g{t)  was  white  Gaussian  with  variance  cr^.  The  AR 


parameters  were  Ai  =  [1,0,0.75]  and  An  —  [1,  0.4,0.^. 
We  estimated  the  Aj  parameters  using  normalized  second- 
and  fourth-order  statistics.  Mean  and  standard  deviations 
estimated  from  100  trials,  with  N  =  1024  rows), 

and  N  =  4096  (last  two  rows)  are  shown  in  Table  4.  Notice 
the  good  performance  of  the  estimates  even  in  the  noisy 
case.  The  cumulant-based  estimates  appear  to  show  an  ad¬ 
vantage  for  small  N  and  smaller  a.  ,  ^ 

Example  4.  We  simulated  an  ARMA(3,3)  pro¬ 
cess  with  parameters  AR=  [1,  -1.6, 1.21, 
[1.5,-6.21,9.1514,-4.1147],  7  =  1,  a  =  0.5, 1,5.  The  data 
length  was  N  =  4000.  We  used  SOS  to  estimate  the  SEMF 
parameters  both  in  the  noisefree  case,  as  well  as  in  the  noisy 
case  (AWGN,  with  aj  =  100).  Mean  and  <7-bounds  of  the 
estimates,  averaged  over  100  trials,  are  shown  in  Fig.  1.  The 
circles  show  the  true  SEMP  IR.  Notice  the  excellent  pei- 
formance  of  the  estimator.  We  note  that  the  a-spectium 
approach  of  [4]  required  about  10^  samples  in  the  noisefree 

^^e  conclude  this  section  by  noting  that  normalized  cor¬ 
relations  and  cumulants  can  be  used  to  obtain  consistent 
estimates  of  the  ARMA  parameters,  even  in  the  Presence 
of  finite  variance  additive  noise  (arbitrary  color  and  pdi). 
Low-variance  estimates  are  obtained  even  with  moderate 
data  lengths  {N  =  2000). 

3  ESTIMATION  IN  SaS  NOISE 

The  observed  signal  is  described  by 

x{n)  =  c(n  1^)  +  'io{n)y  n  =  1, N,  (7) 

Here  c(n  [0)  may  be  either  deterministic  or  a  finite- variance 
signal,  win)  is  iid  SaS,  and  we  want  to  estimate  the 
random  parameter  vector  0.  The  use  of  MLE’s  is  hampeied 
by  the  lack  of  closed-form  expressions  for  the  pdfs.  Since 
E(\u\^)  =  00,  we  are  clearly  in  the  weak  signal  regime.  The 
obvious  approach  is  to  clip  the  irnpulsive  noise  by  passing 
it  through  some  ZMNL,  which  is  linear  near  the  origin.  \Ve 
use  a  ZMNL  which  was  suggested  by  Ljung  [9], 

a  =  median{|5c(n) -median{x(n)}} 

6  —  3Sjv 

(  -6exp(-(u  +  ^)^/2cr^))  ^ 


<l){u)  = 


^exp(— (it  —  6)’^ /2a^)) 


\u\<6 
u>  6 


cr  is  a  tuning  parameter  which  we  arbitrarily  set  to 
If  the  signal  is  not  very  weak,  it  is  better  to  set  6  to 
SJat  H-median{x(n)}.  A  good  ZMNL  will  essentially  trans¬ 
form  the  SaS  noise  to  a  more  Gaussian  looking  one,  leaving 
the  signal  intact.  Hence,  algorithms  which  assume  that  the 
additive  noise  is  nominally  Gaussian  can  be  used  with  the 
ZMNL  output.  An  attractive  feature  of  the  above  ZMNL  is 
that  it  is  data-adaptive;  second,  it  is  analytically  tractable, 
and  the  computational  load  is  not  high.  The  ZMNL  involves 
computation  of  the  median;  in  the  context  of  SaS  processes, 
we  know  that  the  median  is  a  consistent  estimator  of  loca- 

^^^We  applied  this  idea  to  three  estimation  problems:  har¬ 
monic  retrieval,  DOA  and  TDE.  Analytic  denvation  of  per¬ 
formance  bounds  and  comparisons  with  CRB  s  will  be  re¬ 
ported  elsewhere  (see  also  Section  5). 

Since  w{n)  has  infinite  variance,  a  generalized  oNK 

defined  by  E{\c{i)f}hw  has  been  proposed;  however, 
the  empirical  SNR  for  a  fixed  7  varies  treniendously 
with  a.  We  suggest  that  a  better  definition  ot  SNK  is 
E{\c{t)f}/B{<l>'‘{w{t))},  where  the  denominator  is  the  vari¬ 
ance  of  the  ZMNL  output. 

We  applied  ZMNL  pre-processing  to  standard  paramrter 
estimation  problems:  harmonic  retrieval,  DOA  and  TDE. 
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3.1  Harmonic  Retrieval 

The  noisy  signal  is  given  by  y{n)  =  exp(j27r/fcn)  + 

w(ji),  n  =  where  w{n)  is  complex  isotropic  SoS, 

Afc’s  are  complex,  perhaps  random,  amplitudes,  and  fk  are 
the  unknown  frequencies  to  be  estimated.  We  compare  our 
correlation-based  method  to  covariation-based  methods. 
Example  5.  We  let  p  =  2,  /i  =  0.1,  /2  =  0.2,  \Ai\  — 
\A2\  —  1,  7iy  =  1.  We  passed  the  noisy  complex  signal 
through  a  circularly  symmetric  version  of  the  ZMNL  in  (8); 
the  correlation  of  the  ZMNL  output  was  estimated,  and  ES¬ 
PRIT  was  then  applied  to  estimate  the  frequencies.  Each 
realization  had  640  samples  (16  realizations  with  random 
phases,  each  with  40  samples).  We  also  applied  ESPRIT 
to  the  raw  data,  and  to  the  covariation  matrix  estimated 
with  p  =  a  I  A.  The  mean  estimates  (100  trials)  are  shown 
in  Fig.  2  as  a  function  of  a,  for  the  two  frequencies;  the 
corresponding  standard  deviations  are  shown  in  Fig.  3.  In 
the  figures,  -h,  o  and  x  denote  estimates  based  on  ZMNL- 
ESPRIT,  raw  ESPRIT,  covariations  and  the  normalized  co¬ 
variations.  Notice  that  the  ZMNL-based  method  yields  low 
variance  estimates  for  a  >  0.5;  the  raw  correlation-based 
method  performs  poorly,  as  expected.  Rather  surprisingly, 
covariation-based  ESPRIT  did  not  work  well;  it  has  been 
reported  in  the  literature  that  the  Bartlett  estimator  based 
on  covariations  outperforms  that  based  on  correlations. 

3.2  TDE 

The  observed  signals  are  x{n)  =  s(n)-t-ttfn),  y{n)  =  As {n— 
D)  -|-v(n),n  =  1,  ...,iV,  where  signal  s{n)  is  deterministic 
or  random  with  finite  variance,  u(t)  and  v(t)  are  the  sensor 
noises,  assumed  to  be  SaS,  A  is  the  unknown  attenuation 
factor,  and  D  is  the  unknown  delay  to  be  estimated.  The 
sensor  signals  are  passed  through  the  ZMNL  in  (8)  to  clip 
the  noise;  the  delay  is  estimated  by  locating  the  peak  of  the 
‘ML- windowed’  autocorrelation. 

Example  6.  We  let  D  =  10,  A  =  0.9,  N  —  4000.  As 
in  [5],  s(n)  was  white  Gaussian,  with  <j^  =  1\  u{n)  and 
v{n)  were  iid  SaS  processes  with  7=1.  Figure  4  shows 
the  performance  (100  trials)  of  the  proposed  algorithm  for 
a  =  0.7, 1,1.8, 2.0. 

As  noted  in  [5] ,  covariation-based  techniques  can  be  used 
only  if  a  >  1,  and  do  not  appear  to  work  well  for  a  <  1.5. 
Two  new  techniques  were  reported  in  [5]:  In  one  method 
the  sensor  signals  are  raised  to  a  (possibly  different)  frac¬ 
tional  power,  0  <  p  <  a/2,  and  then  cross-correlated.  A 
second  method  minimizes  the  p-norm.  Both  these  methods 
required  N  —  20, 000  samples  to  obtain  results  comparable 
to  ours.  With  ZMNL  pre-processing,  the  correlation-based 
TDE  worked  extremely  well. 

3.3  DOA 

The  approach  used  in  the  harmonic  retrieval  problem  ex¬ 
tends  to  the  DOA  problem,  even  for  a  <  1;  in  contrast,  the 
ROC-MUSIC  approach  of  [14]  is  applicable  only  for  a  >  1, 
(and,  indeed,  assumes  that  all  the  source  signals  and  the 
noise  are  jointly  SaS  with  the  same  a) . 

Example  7.  The  complex  SaS  noise  "had  unit  dispersion; 
the  two  source  signals  were  white  Gaussian  with  unit  vari¬ 
ance.  An  eight  sensor  ULA  was  assumed,  and  500  snap¬ 
shots  were  used.  In  the  first  case,  the  source  bearings  were 
±5®,  and  in  the  second  case  ±10®.  The  bearings  were  esti¬ 
mated  via  correlation-based  ESPRIT,  both  with  (o)  and 
without  (±)  the  ZMNL  pre-processing.  Because  of  the 
poor  performance  of  covariation-based  ESPRIT  in  the  har¬ 
monic  retrieval  problem,  we  did  not  consider  it  here.  Means 
and  standard-deviations  of  the  estimates,  averaged  over  100 
runs,  are  shown  in  Figure  5,  as  a  function  of  a.  The  pro¬ 
posed  estimator  is  seen  to  work  very  well. 

The  data-adaptive  ZMNL  of  (8)  is  very  effective  in  sup¬ 
pressing  the  effects  of  SaS  noise.  Its  performance  is  bet¬ 
ter  than  that  of  covariation-based  techniques;  more  impor¬ 


tantly,  it  works  well  for  all  kinds  of  noises:  Gaussian,  non- 
Gaussian,  stable,  finite  variance.  The  colored  noise  case  is 
handled  along  the  lines  in  [9],  and  results  comparable  to 
those  in  the  iid  case  are  obtained. 


4  DETECTION 

One  of  the  classical  approaches  to  detection  in  the  presence 
of  impulsive  noise  (weak  signal)  is  to  use  a  locally  optimum 
detector  (LOD) .  Designing  a  LOD  requires  knowledge  of  the 
noise  pdf;  an  alternative  is  to  use  a  ZMNL  which  captures 
the  LOD  behavior  for  a  class  of  pdfs;  this  appears  to  work 
well  even  in  SaS  noise  [3].  The  LOD  non-linearity  for 
SaS  rv’s  is  linear  near  rr  =  0,  with  slope  r(3/a)/r(l/a); 
the  linear  approximation  is  valid  for  <C  r(l/a)/r(3/a); 
its  tails  are  given  by  (a  ±  l)/|x|.  This  partly  explains  the 
‘robustness’  of  Cauchy  detectors.  The  ZMNL  in  (8)  has  an 
exponential  tail,  but  works  well.  As  we  saw  earlier,  this 
ZMNL  is  very  useful  in  the  estimation  context. 

As  we  argued  before,  the  ZMNL  essentially  transforms 
the  impulsive  noise  to  a  more  Gaussian- looking  one;  hence, 
standard  detectors,  which  nominally  assume  additive  Gaus¬ 
sian  noise,  can  be  used  at  the  output  of  the  ZMNL.  We 
demonstrate  this  idea  by  considering  a  classical  detection 
problem. 

Example  8.  The  observed  signal  is  y{n)  —  A  ±  u(n), 
n  =  1,  ...,iV,  where  u{n)  is  SaS.  We  want  to  test  whether 
or  not  A  =  0  (i.e.,  detect  the  signal).  The  standard  ai> 
proach  is  to  estimate  the  mean  (or  median)  and  see  if  it 
is  statistically  different  from  zero.  When  u(n)  is  Gaus¬ 
sian,  we  have  the  Student  t-test,  which  detects  the  signal 

if  \/N  |/t|  j  \f^  >  r  ,  where  fi  and  are  sample  es¬ 
timates  of  mean  and  variance,  and  r  is  a  threshold  (1.96 
for  Pja  of  0.05).  Obviously  this  test  will  perform  poorly  if 
u{n)  is  SaS.  We  passed  the  noisy  data  through  the  ZMNL 
in  (8),  and  then  applied  the  Student  t-test,  with  r  =  1.96. 
ROC  curves  for  a  =  1  are  shown  in  Fig.  6  (solid  line:  w/o 
ZMNL,  ’±’  with  ZMNL).  In  these  curves  Pfa  =  0.05  cor¬ 
responds  to  Pd  =  0.9257  (.2728)  with  (without)  the  ZMNL 
when  N=50,  and  Pd  =  0.9965  (.2763)  when  N  =  100.  Note 
that  for  N  >  100,  the  ROC  with  the  ZMNL  is  essentially 
flat  at  Pd  =  1.  These  curves  were  computed  from  a  set  of 
10®  trials. 

We  repeated  the  experiment,  again  using  default  values 
for  the  ${•)  in  (8),  and  with  fixed  threshold  r  =  1.96,  but 
varying  a.  Pd  and  P/a  estimated  from  10®  runs  are  shown 
as  a  function  of  a  in  Fig  7.  The  solid  line  is  Pd  using  the 
ZMNL,  and  the  ±’s  denote  Pd  w/o  the  ZMNL.  We  remark 
that  the  performance  can  be  further  improved  by  tuning 
the  parameters  of  the  ZMNL. 


5  CRAMER-RAO  BOUNDS 

Consider  the  parameter  estimation  problem  for  determinis¬ 
tic  signals  in  iid  SaS  noise.  The  observed  signal  is  y{n)  = 
fjbc{n  |0)  ±  u^(n),  n  =  1, ...,  iV,  where  p  and  0  are  unknown 
non-random  constants,  and  w{n)  is  iid  SaS.  The  waveform 
c(,)  is  completely  specified  by  the  vector  0.  The  develop¬ 
ments  in  [11,  12]  (which  consider  the  finite  variance  case) 
are  directly  applicable  here.  Let  //(a)  denote  the  Fisher 
Information  Matrix  (FIM)  for  location  for  a  SaS  process 
with  parameter  a  and  unit  dispersion,  7=1,  Then,  the 
FIM  for  0  is  given  by 


N 


dc{n)  dc{n) 

dOi  dOj 


Consider  the  multiplicative  noise  models,  y{n)  — 
w{n)c{n  |0),  where  w[n)  is  iid  SaS.  Assume  that  1  <  a  and 
E{w{n)}  —  p.  Let  /7(a)  denote  the  FIM  for  the  squared 
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scale  parameter  7^/“  of  a  standard  SaS  rv  (i.e.,  with  7=1). 
Following  [12],  the  FIM  is  given  by, 

dciji)  1 

^  dOi  dOj  c2(n) 


II 

7w 

where  we  assume  that  c(n)  7^  0  Vn,  The  FIM  for  location 
and  scale  are  finite,  and  are  shown  in  Fig.  8  as  a  function 
of  a  (7  =  1).  Note  that  If  (a)  is  essentially  linear  in  a,  and 
/^(a)  is  weakly  quadratic;  notice  the  change  in  convexity 
at  a  =  1.  The  FIM’s  were  numerically  evaluated  via  the 
STABLE  program  written  by  Prof.  John  Nolan  at  American 
University. 

Now  consider  the  parameter  estimation  problem  for  AR 
SaS  processes,  y(n)  =  -  a(A:)y(n  -  k)  .  Following 

[11,  12],  the  FIM  for  the  AR  parameters  is  given  by 

J(a)  =  {N  —  p)If(a)E{u^}Rhh, 

where  Rhh^  the  deterministic  auto-correlation  of  the  im¬ 
pulse  response  /i(ti),  and  If{oi)  are  both  finite.  But 
E{u^}  =  00;  hence,  the  FIM  is  singular.  The  argument 
extends  to  the  general  linear  model.  Hence,  in  the  context 
of  linear  SaS  processes,  it  is  important  to  establish  rates  of 
convergence  of  the  estimator.  Letting  C  =  cr^Rkh  as  above, 
the  FIM’s  in  (10)  and  (16)  of  [11]  hold  for  a  >  1. 

6  CONCLUSIONS 

Normalized  correlations  and  cumulants  can  be  used  to  ob¬ 
tain  consistent  low- variance  estimates  of  the  ARMA  param¬ 
eters  of  linear  SaS  processes.  Simple  ZMNL  pre-processing 
helps  in  detection/estimation  problems,  when  the  noise  is 
heavy-tailed,  and  perhaps  a-stable.  Cramer- Rao  bounds 
were  derived  for  the  parameter  estimation  problem.  The 
performance  of  our  algorithms  was  illustrated  via  various 
examples. 
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Fig  1.  (Ex  4)  Estimated  IR’s,  /a  ±  cr  bounds;  top: 
a  =  0.5,  bottom:  a  =  1.5 


Fig  2.  (Ex  5)  Mean  estimates  of  fi  and  /2  vs  a 


Fig  3.  (Ex  5)  Std-dev  of  estimates  of  /i  and  /2  vs  a 


1 
0 
-1 

-20  0  20 


-1 


-20  0  20 


robust 


:  1 

: 

-20  0  20 

a=2 


a=1.8 

-V  . ■ - ^ 

-20  0  20 

a=2 


Fig  4.  (Ex  6)  TDE  w  and  w/o  ZMNL 
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Table  2:  (Example  1)  Mixed>phase  MA  estimates 
based  on  normalized  Rxx  and  C4x. 


Fig  8.  FIM  for  location  (top)  and  scale  (bottom) 
vs  a,  with  7=1 
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Abstract 

Cumulants  have  been  successfully  applied  in  the  area 
of  narrowband  array  signal  processing.  This  motivates 
a  performance  analysis  to  find  out  the  strengths  and 
the  weaknessess  of  each  new  algorithm:  Hitherto,  most 
of  the  known  performance  analyses  are  based  on  the 
asymptotic  covariance  of  sample  cumulants  and  are 
therefore  called  asymptotic  performance  analyses.  Re¬ 
cently,  explicit  formulas  for  the  finite-sample  covari¬ 
ances  of  second-,  third-,  and  fourth-order  sample  cu¬ 
mulants  for  any  kind  of  signal,  any  kind  of  noise, 
any  array  shape  and  arbitrary  sensors  are  derived 
[8], [9].  These  formulas  enable  a  finite-sample  perfor¬ 
mance  analysis. 

In  the  single  source  case  the  steering  vector  is  pro¬ 
portional  to  a  vector  built  up  by  a  product  of  second- 
order  cumulants  or  by  fourth- order  cumulants.  This 
means  that  the  finite-sample  (co)variance  of  the  steer¬ 
ing  vector  can  be  investigated  by  using  the  formulas 
for  the  finite-sample  covariance  of  the  second-  and 
fourth- order  sample  cumulant.  Hence,  the  open  ques¬ 
tion  ”  Which  cumulants  should  be  selected  for  steering 
vector  estimation  ?”  -  is  addressed  in  this  paper. 


1  Introduction 

Since  existing  second-order  statistics  based  meth¬ 
ods  cannot  solve  numerous  problems  in  narrowband 
array  signal  processing,  many  higher-order  cumulant 
based  algorithms  have  been  recently  proposed.  For 
example,  the  virtual-ESPRIT- Algorithm  (VESPA)  for 
direction-finding  and  recovery  of  independent  sources 
can  also  calibrate  an  array  of  unknown  configuration 
[5].  Also  VESPA  has  been  extended  for  direction- 
finding  of  highly  correlated  or  coherent  sources  [7], 
which  is  often  the  case  in  practice  due  to  multipath 
propagation  or  ’’smart”  jamming.  Since  higher-order 
cumulants  are  insensitive  to  additive  colored  or  white 


*This  work  was  supported  by  a  scholarship  of  the  NATO  Sci¬ 
entific  Board  and  co-ordinated  by  the  Deutscher  Akademischer 
Austauschdienst. 


Gaussian  noise,  the  suppression  of  Gaussian  noise  is  al¬ 
ways  accomplished  by  VESPA  and  extended  VESPA. 
These  new  methods  and  algorithms  require  a  perfor¬ 
mance  analysis  to  find  out  their  strengths  and  weak¬ 
nessess. 

Some  performance  analyses  for  higher-order  cumu¬ 
lant  based  methods  have  already  been  done.  For  ex¬ 
ample,  Cardoso  and  Moulines  [2]  derived  closed-form 
expressions  for  the  asymptotic  covariance  of  MUSIC- 
like  direction-of-arrival  (DO A)  estimates  based  on  two 
different  fourth-order  cumulant  matrices.  Yuen  and 
Priedlander  [11]  have  compared  the  performance  of 
ESPRIT  [10],  higher-order  ESPRIT  [4],  and  VESPA- 
algorithm.  Performance  analyses  are  usually  based  on 
asymptotic  covariances  of  second-  or  fourth-order  sam¬ 
ple  cumulants;  therefore,  to  provide  a  basis  for  inves¬ 
tigation  of  the  finite-sample  case,  we  have  recently  de¬ 
rived  the  finite-sample  covariance  of  second-,  third-, 
and  fourth-order  sample  cumulants  [8],  [9]. 

This  paper  emphasizes  the  finite-sample  case  to  un¬ 
derstand  the  excellent  performance  for  modest  num¬ 
bers  of  snapshots  observed  by  Dogan  and  Mendel  and 
Gonen  et  al  [7].  Furthermore,  investigating  the  finite- 
sample  case  also  allows  the  analyses  of  adaptive  algo¬ 
rithms.  Another  noticeable  advantage  of  an  analytic 
formula,  compared  to  often  used  Monte-Carlo  simu¬ 
lations,  is  the  easier  and  more  reliable  recognition  of 
properties  of  the  finite-sample  covariance.  This  can  be 
seen  later  by  some  examples. 

The  purpose  of  this  paper  is  to  study  the  accu¬ 
racy  of  steering  vector  estimates  based  on  second-  or 
fourth-order  sample  cumulants  as  a  function  of  the  ar¬ 
ray  shape,  the  DOA’s,  the  kind  and  number  of  sensor’s, 
the  kind  of  source  and  noise  signals,  the  number  of  sam¬ 
ples  and  the  signal-to-noise  ratio  (SNR).  Indeed,  by  us¬ 
ing  the  finite-sample  covariance  formula,  the  necessary 
number  of  samples  for  a  required  estimation  accuracy 
can  be  predicted.  Alternatively,  the  optimal  source  sig¬ 
nals  in  the  sense  of  highest  estimation  accuracy  can  be 
obtained  by  optimization.  The  structure  of  the  paper 
is  as  follows.  In  Section  2,  we  formulate  the  problem 
and  introduce  the  neccessary  notation.  An  analysis  of 
the  single-source  case  is  given  in  Section  3.  In  the  last 
Section  some  interesting  examples  are  shown. 
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2  Problem  Formulation 

Let  P  narrowband  plane  waves  centered  at  a  known 
frequency  u;o  impinge  on  an  arbitrary  array  composed 
of  M  sensors.  The  received  signal  vector  r(t)  = 
(ri(t),...,rA/(t))^  can  be  modeled  as 

T{t)  =  A(0,t?)s(t)  +  n(t),  (1) 

where  s{t)  =  (si(t),  is  a  P  x  1  vector  which 

contains  the  P  zero-mean  source  signal  complex  en¬ 
velopes  at  time  t,  and  is  an  M  x  P  steering 

matrix.  <f)  G  [—tt,  tt]  is  the  azimuth  angle,  G  [0,  tt]  is 
the  elevation  angle  and  n(^)  is  an  M  x  1  vector  com¬ 
posed  of  M  zero-mean  noise  signals.  Any  noise  signal 
^  =  1(1)M  is^  independent  of  any  source  sig¬ 
nal  Sp{t)y  p  =  1(1)P.  Furthermore,  all  signals  Sp{t), 
riinit)  are  modeled  as  sequences  of  independent  and 
identically,  but  arbitrarily  distributed  complex  random 
variables  with  finite  moments  up  to  the  eighth-order. 
This  model  is  commonly  used  in  the  far-field  case,  but 
it  can  be  extended  to  coherent  signals  [7]  and  to  the 
near-field  case  [3]. 

In  this  article,  we  focus  our  attention  on  second- 
and  fourth-order  cumulants  (see  [8]  for  third-order  cu- 
mulants),  which  are  respectively  defined  as 

Ck,i=E{nit)rfit)}  (2) 


—  F  {rk{t)ri  (t)ry^(t)r;i(t)} 

-E  {rk{t)rt{t}  E  {r;;(t)rn(0} 

-E  {rk  {t)rl,  {t}  E  {rf  {t)rn {t) } 
-E{r,(tK(OEK(t)r;;,(t)},  (3) 

since  E  {rm{t)}  =  0  Vm  =  1(1)M.  The  second-order, 
Ck,h  smd  fourth-order  sample  cumulant,  Ck^i,m,n^  are 
obtained  by  replacing  the  expected  values  E  {}  by  time 

averages  ^  >  where  N  is  the  number  of  samples. 

Hence,  given  a  realization  of  the  random  vector  r(^), 
the  sample  cumulants  can  be  calculated.  Note  that 
although  the  estimator  is  unbiased,  the  proposed 
estimator  Ckj,m,n  is  biased  for  the  assumed  model.  In 
order  to  obtain  an  unbiased  estimator,  each  sum  of 
Ckimn  must  be  multiplied  with  a  proper  constant  (see 
[8])’.  ’ 

We  omit  the  tedious  calculations  and  the  formu¬ 
las  of  the  finite-sample  covariances.  For  example,  the 
finite-sample  covariance  of  the  fourth-order  sample  cu¬ 
mulants  consist  of  1  eighth-,  12  sixth-,  76  fourth-, 
and  132  second-order  moments  of  rk{t),  resulting  in 
several  thousand  terms  of  moments  of  Sp(t),  nm{t) 
[8], [9].  All  these  formulas  are  published  in  different 
computer  languages  on  the  World-Wide- Web  under 
http:/ /f b9nt~ln.uiii-duisburg.de/mitarbeiter 
/kaiser /hoswshop . 97 /hoswshop97 . html. 

im  =  1(1)M  is  a  more  compact  form  of  m  =  1,2, ...,  A/.  The 
number  in  parentheses  means  the  increment  of  the  sequence. 


3  Analysis  of  the  Single-Source  Case 

In  the  single  source  case,  the  model  (1)  can  be  writ- 
ten  as 

r(f)  =  a(<^,t?)s(t)  +  n(t),  (4) 

where  a(<^,  1?)  =  (oi((^,i?),...,om(<A,»?))^, 

(5) 

Is  the  response,  {xm,ym,Zm)  are  the  coordi¬ 
nates  of  the  m-th  sensor,  A  =  2iTc/iJo  is  the  source 
wavelength,  and  c  is  the  propagation  speed  of  the  plane 
wave. 

Using  some  well-known  properties  of  cumulants  [6], 
it  can  be  shown  that 

Ck,i  =  72.S  i9)  a* {4>,  'd)  I  (6) 

and 

—  74, »  (0>  (7) 

if  fc,  Z,  m,  n  are  not  equal  at  all,  or,  if  they  are  equal, 
nk{t)  must  be  Gaussian  distributed.  This  latter  con¬ 
dition  is  due  to  the  fact  that  higher-order  cumulants 
of  Gaussian  random  variables  are  zero  [6].  72,5  is  the 
second-  and  74,5  is  the  fourth-order  cumulant  of  the 
random  variable  s{t). 

Eq.  (6)  suggests  estimating  the  steering  vector,  up 
to  the  scale  factor  7!^^  aj^(0, 1?)  oi  {<!>,  by 

/  Ci^M  \ 

C2,Af 

:  .  (8) 

\  Ci,Af  CM,Af-l  / 

This  can  be  easily  proven  by  substituting  Ck,i  for  Ck^i 
and  inserting  (6).  Similiarily,  an  estimator  based  on 
fourth-order  cumulants  is  obtained  from  eq.  (7),  up  to 
the  scale  factor  74,5  ak{(t>,'d)  a*((/>,i?)aj^((^,i?),  as 

:  j.  (9) 

^kjl,m,M  / 

The  only  restriction  on  kj,m  in  eq.  (9)  is  that  one 
of  them  must  be  unequal  to  another.  This  suggests 

collecting  steering  vector  estimates 
many  different  indices  as  possible,  and  using  the  mean 
of  them  to  reduce  the  variance.  However,  inserting  a 
realization  of  r(t)  into  eq.  (9),  and  rewriting  it  as  a 
function  of  a((^,t?),  yields 

=  h,l,m  a(0,  J?)  -I-  9k, I, m  (10) 

where  fkj.m,  9k,i,m  are  independent  of  a(0,??).  Conse¬ 
quently,  all  these  vectors  are  linearly  dependent;  thus, 
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in  the  following  {k,  I,  m)  is  always  fixed  to  (3, 2, 1)  and 
the  indices  of  discarded. 

We  mentioned  eplier  an  unbiased  estimator  for 
Cfc,;,m,n;  therefore,  can  also  be  estimated  un¬ 

biased.  In  contrast,  it  is  not  possible  to  obtain  an  un¬ 
biased  estimator  for  t?),  because  the  bias 

B{ck,i  Cm,n}  =  ^iB{rkit)r;{t)rm{t)C{t)} 

-B{rk{t)r;{t)}E{r,n{t)r*„{t)}). 

always  remains  a  function  of  the  fourth-order  moment. 
This  is  the  price  paid  for  using  the  second-order  based 
estimator. 

4  Simulations  and  Analytic  Evalua¬ 
tions 

In  this  Section,  we  study  the  performance  of  the  pro¬ 
posed  steering  vector  estimators  by  Monte-Carlo  sim¬ 
ulation  and  analytic  evaluation.  In  all  the  cases,  nm{t) 
is  Gaussian  distributed  for  all  m  =  1(1)M,  the  num¬ 
ber  of  realizations  to  obtain  the  simulated  variance  is 
fixed  to  1000,  the  fourth-order  cumulant  indices  are 
(fe,  Z,  m)  =  (3, 2, 1),  and  one  source  with  E{K<)|2}  =  1 
is  always  assumed  (P  =  1).  If  nothing  else  is  mentioned 
in  the  following,  then  iV  =  20,  M  =  20,  SNR  =  OdB,  a 
linear  array  of  omnidirectional  =  1)  sensors 

with  spacing  A/2  is  assumed,  s(Z)  is  a  BPSK-signal, 
and  the  azimuth  and  elevation  direction  of  arrival  are 
0  =  0^,  =  0®.  Due  to  the  inherent  scale  factor,  all 

figures  show  the  normalized  variance  and  bias  of 
and  .  Note  also  that  always  the  mean 


of  the  normalized  variance  and  squared  bias,  respec¬ 
tively,  will  be  plotted  for  easier  comparison.  and 
are  defined  equivalently.  The  estimated  variance 

obtained  from  the  Monte-Carlo  simulations  will  be  de- 
-  (2,4) 

noted  as  y  ’  .  and  will  always  be  plotted 

^  (2) 

as  dashed  lines,  and  as  solid  lines,  V  as 

-  (4) 

plus-signs,  and  V  as  circles. 

4.1  Example  1:  Dependence  on  the  number  of 
samples  N 

One  often  mentioned  deficiency  of  higher-order 
statistics  is  the  increased  number  of  data  to  achieve 
the  same  estimation  accuracy  obtained  from  second- 
order  statistics;  hence,  an  investigation  of  the  num¬ 
ber  of  samples  iV,  which  is  chosen  in  this  example  as 
N  =  10(1)100,  is  very  interesting. 


In  Fig.  1,  observe  tha,t  exhibits  not  only  a  lower 
mean-squared  bias  =  O^N),  but  also  a  lower 

mean  variance  than  This  is  a  surprising  result, 

especially  since  it  is  also  valid  for  very  small  N.  The 
good  agreement  between  the  simulation  and  analytic 
evaluation  confirms  this  result. 


5 

4 

f  3 
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Figure  1. 

4.2  Example  2:  Dependence  on  the  array 
shape 

In  this  example  we  have  varied  the  location  xi  = 
0,  yi/X  =  0(0.1)2  of  the  first  sensor  of  a  (non)-linear 
array  composed  of  M  =  5  sensors.  Since  is  always 
very  small  compared  to  it  will  be  omitted  in  the 
following. 

Observe  from  Fig.  2,  that  and  are  inde¬ 
pendent  of  the  sensor  location.  Many  additional  exam¬ 
ples  with  non-linear  arrays,  which  are  not  shown  here, 
confirm  this  interesting  result.  We  will  take  up  this 
property  again  in  the  next  example.  Note  that  this  in- 

-  (2) 

dependence  cannot  be  reliably  recognized  by  V  and 
-  (4) 

V  .  For  this  set  of  parameters,  fourth-order  sample 
cumulants  are  always  more  accurate  estimators  than 
second-order  sample  cumulants  for  any  array  shape. 
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Figure  2. 


4.3  Example  3:  Dependence  on  the  direction- 
of-arrival 

Now  the  azimuth  direction-of-arrival  <j)  =  — 90^(F)90° 
is  varied. 


Figure  3. 

It  can  be  seen  from  Fig.  3,  that  and  are 

only  slightly  dependent  on  the  azimuth  direction-of- 
arrival.  Since  the  direction  of  arrivals  and  the  sensor 
location  only  affects  the  angle  of  the  steering  vector 
elements  for  omnidirectional  sensors  (see  eq.  (5)),  we 
investigate  now  the  dependence  of  and  on 

this  purpose  we  consider  a  linear  array 
composed  of  M  =  5  dipoles  with  gm{4>,‘0)  =  sm{^  - 
am),  where  (ai, -S'",  10"^)^  is 
the  orientation  vector  of  the  dipoles  [5]. 

Observe,  from  Fig.  4,  that  here  the  fourth-order 
sample  cumulants  are  not  always  more  accurate  estima¬ 
tors  than  second-order  sample  cumulants.  Especially  in 
the  region  around  (j>  =  0^  where  the  dipoles  are  very  in¬ 
sensitive,  the  is  considerably  lower  than  The 
dramatic  increase  in  this  region  is  due  to  the  normaliza¬ 
tion,  since  the  scaling  factor  is  zero  at  0  =  ai,  ...,aM • 


Figure  4. 

A  further  example  of  practical  interest  is  the  be¬ 
haviour  of  a  linear  array  composed  of  M  =  20  radar 
antennas  with  response  [1]  gm{(t>,'d)  =  1/  sinOd)  Vt?  e 
(20^70°). 


Figure  5. 


The  results  in  Fig.  5  are  in  contrast  to  the  dipole 
array.  Here,  is  considerably  lower  for  any  elevation 
angle  t?  compared  The  latter  examples  suggest 

that  for  direction-sensitive  sensors  fourth-order  sample 
cumulants  do  not  always  lead  to  a  higher  estimation 
accuracy. 


4.4  Example  4:  Dependence  on  the  number  of 
sensors 


In  this  example  the  number  of  sensors  M  =  3(1)20 
is  varied.  Observe  from  Fig.  6  that  is  always 
lower  than  The  difference  between  them  is  slightly 
increasing  with  increasing  M. 
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5  Conclusions 


Figure  6. 


4.5  Example  5:  Dependence  on  SNR  and  kind 
of  signals 


Up  to  now  we  fixed  the  SNR  to  OdB  and  assumed  a 
BPSK-signal.  Yuen  and  FHedlander  [11]  have  noticed 
that  the  use  of  4-QAM  signals  instead  of  BPSK  sig¬ 
nals  decreases  the  performance  of  fourth-order  cumu- 
lant  based  algorithms  compared  to  second-order  cu- 
mulant  based  algorithms;  therefore,  at  this  poiiit  we 
will  not  only  vary  the  SNR  but  also  the  kind  of  sig¬ 
nal.  Fig.  7  shows  the  logarithm  of  for 

SNR  =  — 20db(5db)40db  and  for  BPSK-,  4-QAM-, 
and  64-QAM  signals. 


1 

iogy(''> 

iogt/(2) 


Figure  7. 


For  BPSK-signals  (lower  dashed  and  solid  curves) 
fourth-order  sample  cumulants  give  somewhat  better 
results  than  second-order  sample  cumulants  for  mod- 
erate  and  high  SNR;  however,  the  opposite  is  true  for 
4-QAM  (middle  dashed  and  solid  curves)  and  especially 
for  64-QAM  (upper  dashed  and  solid  curves)  signals, 
which  confirms  the  statement  of  [11]  also  in  the  small 
sample  case. 


In  this  paper  we  studied  the  finite-sample  variance  of 
steering  vector  estimates  based  on  second-  and  fourth- 
order  sample  cumulants  in  the  single  source  case.  For 
BPSK-signals  and  moderate  or  high  SNR’s,  fourth- 
order  sample  cumulants  usually  exhibit  a  lower  finite- 
sample  variance  than  second-order  sample  cumulants. 
This  result  is  very  general  and  also  holds  for  other  pa¬ 
rameters,  like  the  array  shape,  number  of  sensors,  and 
especially  the  number  of  samples.  On  the  other  hand, 
for  very  low  SNR-environments,  or  for  4-QAM  and  64- 
QAM  signals,  second-order  sample  cumulants  should 
be  preferred.  Of  course,  all  these  results  are  only  valid 
for  the  conditions  of  the  model  defined  in  eq.  (1),  like, 
for  example,  the  independence  of  the  noise  signals.  Fu¬ 
ture  work  will  investigate  the  finite-sample  behaviour 
of  VESPA  for  one  source  where  noise-free  second-order 
cumulants  are  calculated  by  using  fourth-order  cumu¬ 
lants.  In  addition,  a  combination  of  and  ^  e.g. 
the  mean,  will  be  analyzed  and  the  multiple  source  case 
will  be  considered.  As  mentioned  in  the  problem  for¬ 
mulation,  the  model  in  eq.  (1)  can  be  extended  to  the 
near-field  and  coherent  sources  cases,  which  can  also 
be  investigated  using  the  tools  of  this  paper  and  [8J, 
[9]. 
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ABSTRACT 

Recently  reported  estimator  bank  approach  [1]  is  ex¬ 
tended  below  to  the  fourth-order  direction  finding  algo¬ 
rithms.  The  essence  of  our  approach  is  to  exploit  “par¬ 
allel”  underlying  eigenstructure  based  estimators  for 
removing  the  outliers  and  improving  the  direction  find¬ 
ing  performance  in  the  threshold  domain.  The  pseudo- 
randomly  generated  weighted  fourth-order  MUSIC  es¬ 
timators  are  exploited  as  underlying  techniques  for  es¬ 
timator  bank.  Motivated  by  the  superior  performance 
and  reduced  computational  complexity  of  beamspace 
and  root  modifications  of  the  second-order  eigenstruc¬ 
ture  techniques,  beamspace  root  implementations  of 
fourth-order  MUSIC  and  fourth-order  estimator  bank 
are  developed.  Simulations  show  dramatical  improve¬ 
ments  of  the  threshold  performance. 

1.  INTRODUCTION 

The  performances  of  direction  finding  algorithms  are 
known  to  degrade  as  the  Signal  to  Noise  Ratio  (SNR) 
goes  down  below  a  certain  threshold  or  as  a  number  of 
snapshots  becomes  small.  This  phenomenon  referred 
to  as  threshold  effect^  is  especially  strong  for  higher- 
order  methods  [2]-[4].  Recently,  a  promising  estimator 
hank  approach  [1],  [5]  allowing  to  lower  the  SNR  thresh¬ 
old  has  been  developed  for  second-order  eigenstructure 
based  algorithms.  The  algorithm  proposed  in  [1],  [5] 
has  been  referred  to  as  Pseudo- Random  Joint  Estima¬ 
tion  Strategy  (PR-JES).  The  essence  of  this  approach 
is  to  exploit  the  additional  information  arising  when 
several  underlying  Direction  Of  Arrival  (DOA)  estima¬ 
tors  are  calculated  simultaneously  for  a  same  batch  of 
data  (for  single  data  record)  [6].  To  improve  the  perfor¬ 
mance,  the  evolutionary  principles  are  used,  i.e.  PR- 
JES  chooses  and  exploits  most  “successful”  estimators 

This  research  was  supported  in  parts  by  the  SASPARC 
Project  of  INTAS,  by  the  grant  Bo  568/22-1  of  DFG,  and  by 
Alexander  von  Humboldt  Foundation. 


(having  no  failure  in  the  preliminary  estimated  source 
localization  sectors)  from  the  full  set  of  underlying  tech¬ 
niques. 

In  this  paper,  we  extend  this  approach  to  the.fourth- 
order  direction  finding  algorithms  based  on  contracted 
quadricovariance  [2],  [4].  Weighted  fourth-order  MU¬ 
SIC  estimators  with  pseudorandomly  generated  weight¬ 
ing  matrix  are  introduced  and  exploited  as  underlying 
techniques  for  the  fourth-order  estimator  bank.  Moti¬ 
vated  by  the  superior  performance  and  reduced  com¬ 
putational  complexity  of  the  second-order  beamspace 
root  eigenstructure  techniques  [7],  beamspace  root  im¬ 
plementations  of  fourth-order  MUSIC  and  fourth-order 
estimator  bank  are  then  developed.  Simulations  with 
QAM  sources  and  unknown  colored  Gaussian  noise  de¬ 
monstrate  very  significant  improvements  of  the  thresh¬ 
old  performance  relative  to  fourth-order  MUSIC. 

2.  FOURTH-ORDER  MUSIC  AND 
WEIGHTED  MUSIC 

Consider  a  linear  array  of  n  sensors.  Assume  that  there 
are  q  <  n  narrowband  stationary  zero-mean  mutually 
uncorrelated  far-field  sources.  The  fth  array  vector 
snapshot  can  be  modeled  as  [3],  [4] 

x{i)  -  As{i) -\-n{i),  i  =  1,  2, . . .,  M  (1) 

where  A  =  [a(^i), . . . ,  o,{0q)]  is  the  nxq  direction  ma¬ 
trix,  ^1,^2?  •••}^^  are  the  signal  DOA’s,  a(0)  is  the 
n  X  1  steering  vector,  ^(i)  is  the  q  x  1  vector  of  ran¬ 
dom  non- Gaussian  circular  source  waveforms,  and  n(2) 
is  the  n  X  1  vector  of  unknown  colored  Gaussian  sensor 
noise. 

The  DOA  estimation  problem  considered  below  is 
that  given  the  measurements  «(«),  2  =  1, 2, ... ,  M ,  the 
signal  DOA’s  ^i, •  •  • » should  be  estimated.  Define 
the  second-  and  fourth-order  moments  [3],  [4] 

M2(^  k)  =  =  E  [a?z(0^Pjfc(0]?  ^  <n  (2) 
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k,  m,p)  =  E  [a:i(*)^ife(*)^m(0®p(*)]i 

I  <l,k,m,p  <  n  (3) 

where  R  =  E{a!(i)*^(i)}  is  the  n  x  n  covariance  ma¬ 
trix,  (•)^  and  (•)*  denote  the  Hermitian  transpose  and 
complex  conjugate,  respectively.  The  consistent  sam¬ 
ple  estimates  of  (2)  and  (3)  are  given  by 

1  ^ 

fi2{I,  k)  =  E  *'(0*^(0  (4) 

i=l 

1  " 

k,m,p)  =  —  ^  xt{i)xl{i)xm{i)Xpii)  (5) 

*  =  1 

The  quadricovariance  is  defined  as  a  set  of  fourth-order 
cumulants  [4]  and  can  be  written  under  circularity  as¬ 
sumption  as 

/C4(Z, 

k)p2{m,p)  -  P2ihp)ti2{m,  k),  (6) 

The  consistent  estimate  of  (6)  is  given  by 

/C4(/,  k,m,p)  =  P4il,  k,  m,  p) 

-P2O,  k)p2(m,p)  -  P2{Up)P2{m,  k)  (7) 

The  contracted  quadricovariance  is  defined  as  [2],  [4] 

n 

Clk  =  E  /C4(/,  Ar,  m,  m),  (8) 

m=l 

Using  (6),  rewrite  (8)  in  matrix  notation  as  [4] 

C  =  E  [*" (*>(0  ® (0*"(i)]  -Tt{R)R-R'^  (9) 

The  sample  estimate  of  (9)  is  given  by 

C  =  -i-  5] x(i)x^ii)  -Ti{R)R-R 

(10) 

It  can  be  shown  [4]  that  (9)  can  be  represented  as 

C  =  AZA”  (11) 

where  Z  is  a,  qxq  Hermitian  matrix.  According  to  (11), 
the  structure  of  C  is  similar  to  that  of  the  noise-free 
covariance  matrix  in  the  second-order  case  [4].  Hence, 
the  conventional  MUSIC  algorithm  can  be  exploited 
for  DOA  estimation.  Express  the  eigendecomposition 
of  (10)  as 

C  =  ^Aj-ejef^  =  EsAsEs  (42) 

1  =  1 


estimator  coipbined 
bank  estimator 


Figure  1:  Concept  of  estimator  bank. 


where  the  n  x  g  and  nx{n-q)  matrices  Es  and  En 
contain  the  sample  signal  and  noise  subspace  eigenvec¬ 
tors,  respectively,  and  q  is  an  estimate  of  g.  In  turn, 
the  g  X  g  and  (n  -  g)  x  (n  -  g)  matrices  A5  and  An 
contain  the  signal  and  noise  subspace  sample  eigen¬ 
values,  respectively.  The  contracted  quadricovariance 
based  MUSIC  [2],  [4]  estimates  the  source  DOA’s  as 
locations  of  g  highest  peaks  of  spectral  function 

/cQ-MUSic(^)  =  [a^{6)ENENO’{^)]  ^  (13) 

Weighted  contracted  quadricovariance  MUSIC  estima¬ 
tor  can  be  defined  as  [5] 

f^{e)  =  [a^i0)ENWENaie)]-^  (14) 

where  W  is  the  (n  -  g)  x  (n  -  g)  nonnegative  definite 
weighting  matrix. 

3.  ESTIMATOR  BANKS  AND  PR-JES 

The  essence  of  the  estimator  bank  approach  [1],  [5] 
is  to  generate  multiple  “independent”  estimators  for 
the  same  batch  of  data  snapshots.  To  improve  the 
performance,  the  evolutionary  principles  are  used,  i.e. 
the  underlying  estimators  are  involved  into  a  concur¬ 
rence  and  only  most  “successful”  estimators  (having 
no  failure  in  the  preliminary  estimated  source  local¬ 
ization  sectors)  are  used  while  all  failed  estimators  are 
attributed  to  outliers  and  removed  from  the  further 
consideration.  The  results  of  “successful”  estimators 
are  then  combined  in  the  final  estimate  as  shown  in 
Fig.  1.  Assume  that  arbitrary  underlying  estimators 
I  =  1, 2, . . ., K  are  calculated  in  a  parallel  man¬ 
ner,  using  the  single  batch  of  array  data  x(i),  i  = 
1,2, ...,M,  or,  equivalently,  using  the  single  estimate 
C  of  the  contracted  quadricovariance  matrix.  Similarly 
to  the  second-order  case,  define  the  fourth-order  esti¬ 
mator  bank  as  [5] 

1=1,2,... ,K}^  (15) 

It  is  suitable  to  generate  the  underlying  estimators  in  a 
pseudorandom  manner,  using  the  set  of  weighted  MU¬ 
SIC  estimators  (14)  with  rank-one  weighting  matrices 


W  =  ww^  withdrawn  from  the  Gaussian  random  gen- 
erator 

Wi  CJ\r{Q,  /),  /  =  1, 2, . . . ,  (16) 

where  E[wiw^]  =  Sikl,  E[wiwl]  =  0,  J  is  the  identity 
matrix,  and  (•)^  denotes  the  transpose.  Using  (16), 
as  many  independent”  estimators  as  necessary  can 
be  generated  to  satisfy  the  required  compromise  be¬ 
tween  computational  cost  and  threshold  performance. 
A  choice  of  rank-one  weighting  matrices  for  (14)  is  dic¬ 
tated  by  computational  reasons  [5]. 

Define  the  following  hypothesis  which  is  formulated 
for  an  arbitrary  DOA  estimator: 

Ti:  The  esiimaior  spectral  function  has  more  than  g  —  1 
separate  spectral  peaks  localized  in  65. 

Here,  ©5  are  the  preliminary  estimated  angular  sectors 
of  source  localization.  Assume  that  the  source  localiza¬ 
tion  sectors  are  specified  as  L  non-overlapping  intervals 

^6,1]  U  [6a, 2^  06,2]  U  -  •  •  U  [0a,L,  6^,1]  (17) 

and  that  the  estimate  of  the  number  of  sources  q  is 
known.  The  PR-JES  technique  [1],  [5]  can  be  written 
for  given  sample  contracted  quadricovariance  matrix  C 
and  K  pseudo-randomly  generated  underlying  estima¬ 
tors  as  the  following  sequence  of  steps: 

Calculate  the  MUSIC  estimator  (13)  and  test 
the  hypothesis  7i  for  this  estimator  alone.  If  Ti  is 
accepted  for  this  estimator  then  estimate  the  source 
DOA’s  as  the  locations  of  q  highest  peaks  of  its  spec¬ 
tral  function  (13),  and  terminate  the  algorithm  (i.e., 
go  to  step  4).  If  H  is  not  accepted  for  the  MUSIC 
estimator  then  go  to  the  next  step. 

Step  2;  Generate  K  different  random  vectors  wi,  I  = 

1,  2, . . .,  a:  using  the  random  generator  (16)  and  calcu¬ 
late  the  underlying  DOA  estimators  (14).  Denote  their 
spectral  functions  /,(0),  /  =  1, 2, . . . ,  A^  As  a  result  of 
this  step,  the  estimator  bank  (15)  is  completed. 

^ep  3;  Test  the  hypothesis  Ti  for  each  DOA  estimator 
from  the  estimator  bank. 

^  If  is  accepted  for  any  J  (0<J  <Ar)  estimators 
MO),  I  =  1,2,  ...,J  from  the  total  number  of  A"  es¬ 
timators  fi{0),  I  =  1,2,  ...,A’  then  estimate  the  k-th 
DOA  Ok  as 

k  =  l,2,...,q  (18) 

where  (9^  ^  <  ...  <  jg  ordered  set  of 

angles,  corresponding  to  the  q  highest  peaks  of  function 
fi(0)  that  are  localized  in  ©5,  and 


med{6i,...,h„}  =  /  +0/2, 


even  m  ,  , 

odd  m  0^) 


Here  —  sort{6i,...,6„j},  and  sort{*”} 

denotes  the  operator  of  sorting  in  ascending  (descend¬ 
ing)  order. 


If  n  is  not  accepted  for  all  DOA  estimators  from 
the  estimator  bank  then  estimate  the  jfc-th  DOA  9k  as 

4  =  med{0W  ef)},  k  =  l,2,...,q  (20) 

where  ^  <  9^'^  <  •  •  •  <  9^’^  is  the  ordered  set  of 
angles,  corresponding  to  the  q  highest  peaks  of  function 
fi(9)  that  are  localized  in  the  whole  array  field  of  view 
[-90®,  90°]. 

Step  4;  Stop.  q 

Testing  of  Ti  is  used  for  detecting  failures  in  the  un¬ 
derlying  estimates.  It  helps  to  remove  the  outliers  and 
groups  “successful”  estimators.  Step  1  guarantees  the 
asymptotic  behaviour  of  PR-JES  to  be  not  worse  than 
that  of  MUSIC.  I.e.,  if  MUSIC  is  decided  to  have  no 
failure  within  then  no  more  processing  is  necessary 
because  MUSIC  is  known  to  provide  excellent  asymp>- 
totic  performance.  Otherwise,  the  estimator  bank  is 
used  for  lowering  the  SNR  threshold. 

The  role  of  sector  information  (17)  is  very  impor¬ 
tant.  Each  separate  sector  may  include  several  unre¬ 
solved  sources  without  affecting  the  performance  of  the 
following  processing  steps.  However,  the  performance 
of  PR-JES  may  degrade  if  some  of  the  true  DOA’s  are 
not  included  in  ©5  or  if  these  sectors  are  inadequately 
wide  compared  to  the  real  source  clusters.  Note  that 
the  similar  sector  information  is  used  in  beamspace  pro¬ 
cessing  and  the  similar  problems  arise  if  this  informa¬ 
tion  is  inadequate. 

4.  BEAMSPACE  ROOT  IMPLEMENTATION 

Motivated  by  the  success  of  the  previous  research  on 
beamspace  and  root  high-resolution  methods  and  on 
combining  of  beamspace  and  root  algorithms  into  one 
scheme  [7],  let  us  develop  a  beamspace  root  implemen¬ 
tation  of  fourth-order  PR-JES  for  a  Uniform  Linear  Ar¬ 
ray  (ULA).  The  importance  of  beamspace  root  imple¬ 
mentation  can  be  motivated  by  the  following:  i)  poly¬ 
nomial  rooting  will  provide  significant  computational 
savings  compared  with  spectral  search,  ii)  signal  con¬ 
centration  in  beamspace  and  polynomial  rooting  are 
expected  to  further  improve  the  threshold  and  asymp¬ 
totic  performances  and  robustness  against  mismodel- 

[7],  iii)  the  preliminary  knowledge  of  source  local¬ 
ization  sectors  can  be  simultaneously  exploited  for  the 
hypothesis  testing  procedure  in  PR-JES,  as  well  as  for 
the  design  of  beamspace  transformation  matrix. 

Taking  into  account  the  analogy  between  contracted 
quadricovariance  and  covariance  matrices,  it  is  logi¬ 
cal  to  apply  the  beamspace  transformation  directly  to 
the  sample  contracted  quadricovariance  matrix  (10). 
Hence,  the  px  p  sample  beamspace  contracted  quadri¬ 
covariance  matrix  can  be  introduced  as 

Cb  =  T^CT  (21) 
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where  T  is  the  n  x  p  (g  <  p  <  n)  beamspace  transfor¬ 
mation  matrix  satisfying  T^T  =  I.  Write  the  eigen- 
decomposition  of  (21)  as 

where  the  q  x  g  and  (p-  g)  x  (p-  3)  diagonal  matrices 
ts  and  Fn  contain  the  signal  and  noise  eigenvalues, 
and  the  columns  of  the  p  x  g  and  p  x  (p  -  9)  matrices 
ils  and  Un  contain  the  signal  and  noise  eigenvectors, 
respectively.  Using  the  analogy  between  spectral  and 
root  estimators  and  between  element  and  beam  spaces, 
the  beamspace  root  version  of  the  contracted  quadrico- 
variance  based  MUSIC  estimator  (13)  can  be  expressed 
as 

/CQ-Music(z)  =  a'^{z-^)TUNulT^a{z)  (23) 

where  a(z)  =  (1,  z, . . . ,  The  signal^  DOA’s 

can  be  estimated  from  the  closest  to  the  unit  circle 
roots  of  the  polynomial  (23).  In  turn,  we  introduce  the 
beamspace  root  version  of  the  weighted  MUSIC  esti- 
mator  (14)  as 

f^{z)  =  {z-^)TXJ  a{z)  (24) 

where  W  is  the  (p  -  g)  x  (p  -  g)  weighting  matrix. 

Consider  a  general  case  of  multiple  sectors  (multi¬ 
ple  source  clusters).  In  this  case,  different  beamspace 
transformation  matrices  are  required  for  each  sector, 
and  PR-JES  must  be  applied  to  the  underlying  poly¬ 
nomial  sets  rather  than  to  underlying  estimators  as  in 
Section  3.  I.e.,  each  underlying  estimator  in  the  estima¬ 
tor  bank  is  represented  now  as  a  polynomial  set  of  the 
dimension  L,  where  L  is  the  total  number  of  angular 
sectors  in  (17),  and  where  polynomials  from  the  same 
set  differ  by  the  beamspace  transformation  matrix  T. 

Consider  first  the  polynomial  set  {/,(z)}f=i  associ¬ 
ated  with  arbitrary  root  estimator  /(z).  Let  the  ith 
polynomial  /,(z)  from  this  set  has  Ni  roots  (for  ex¬ 
actness,  hereafter  we  consider  only  the  roots  lying  in¬ 
side  the  unit  circle)  associated  with  the  angles  localized 
within  arbitrary  interval  [6a,i,h,i]‘  Tbe  following  hy¬ 
pothesis  will  be  used  in  the  sequel  for  sorting  out  the 
“successful”  estimators: 

L 

Hi  ^Ni>q  (25) 

»=i 

Now,  we  are  able  to  formulate  the  beamspace  root 
implementation  of  PR-JES  which  can  be  represented 
as  the  following  sequence  of  steps. 

Step  1:  For  each  interval  [0a,i,Oi,i]  compute  the  nia- 
trix  T  and  the  non- weighted  beamspace  MUSIC  poly¬ 
nomial  (23).  Denote  this  polynomial  /,(z).  As  a  re¬ 
sult  of  this  step,  L  different  polynomials  /,(z),  i  = 


1, 2, . . . ,  L  are  available  for  different  intervals  [0o,»  i 
*  =  l,2,...,i. 

Step  2:  For  each  interval  [Oa,i,  i=l,2,...,L,  find 
the  roots  {zi,uZi,2,-.-,Zi.Ni}  of  fi{z)  associated  with 
the  angles  localized  within  \6a,ii  Test  the  hypothe¬ 
sis  (25).  If  (25)  is  accepted  then  estimate  source  DOA’s 
frk  flip  11  Tilt,  rlrrle  roots  selected  from 


the  roots  {zi,l,  Zi,2,  •  •  •  >  >  ^2,1)  ^2,2)  •  •  •  i  •  •  • ) 

ZL,i^  zl,2,  •  •  • )  zl,Nl  }  and  stop  (go  to  step  5).  If  (25)  is 
not  accepted  then  go  to  the  next  step. 

Step  3;  Generate  w,,  /  =  1, 2, . . .,  A'  using  (16)  and 
compute  K  underlying  polynomials  (24)  for  each  inter¬ 
val  [Sa.t,  0b, i]  using  the  previously  computed  beamspace 
transformation  matrices  for  these  intervals.  Denote  th¬ 
ese  polynomials  /i*^(z),  i  =  1, 2, . . .,  A,  I  =  1,2, . . .,  A. 
As  a  result  of  this  step,  K  polynomial  sets  {/}  ^(z)}f_i, 

I  =  1, 2, . . . ,  A  are  available. 

Step  4:  For  each  interval  [0a, ii  0b,i],  *  =  1, 2, . . . ,  A,  and 
each  polynomial  fP  (z)  corresponding  to  this  interval, 
find  the  roots  {zf},  zj% . . . ,  associated  with  the 

angles  localized  within  [0a,ii0b,i\-  Here  nP  is  the  num¬ 
ber  of  such  roots  of  the  polynomial  /^(z).  For  each 
polynomial  set  {/<*^(2)}^i,  set  Ni  =  nP  and  test 
(25).  If  (25)  is  accepted  for  any  J  (0  <  J  <  A) 
polynomial  sets  {/i*\2r)}|Li,  /  =  1,2, from  the 
total  number  of  A  polynomial  sets  {/}  ^(^)}^i,  ^  — 
1,2, ...,A'  then  for  each  polynomial  fP(z)  find  the 
roots  {z['i\z,^3>---’^-!i<')}  associated  with  the  angles 

localized  within  Here  nP  is  the  number  of 

such  roots  of  the  polynomial  ^'^(z).  Estimate  the  feth 

DOA  using  (18)  where  <0P  <•••<  0P  is  the  or¬ 
dered  set  of  angles  associated  with  the  g  closest  to  the 
unit  circle  roots  selected  from  the  roots  •  •  •’ 


-('/')  "CO  "CO  :?CO 

"'l  jy-CO’  ^2,2>  •  •  • »  *  *  *  ’  • 


?(0.  ?(0 


;(0 


. . . 

[f  ’(25)  is  not  accepted"*  for  all  A  polynomial  sets 
/  =  1,2, . . .,  A'  then  estimate  the  fcth  DOA 

rsing  (20)  where  0P  <0P  <■■■<  0P  is  the  ordered 
set  of  angles  localized  in  the  whole  array  field  of  view 


—90®,  90°],  and  associated  with  the  g  closest  to  the 
mit  circle  roots  selected  from  the  overall  number  of 
•oots  of  the  polynomial  set  {/P(2)}<Li- 
ftATt  5:  StoD.  ^ 


5.  SIMULATION  RESULTS 

In  our  simulations,  a  ULA  of  ten  omnidirectional  sen¬ 
sors  with  half-wavelength  spacing  and  two  4-QAM  eq¬ 
ual-power  far-held  sources  impinging  from  =  10  , 
02  =  12.5°  were  assumed.  The  number  of  independent 
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Figure  2:  Experimental  RMSE’s  of  DOA  estimation 
versus  SNR.  K  —  20. 

snapshots  in  each  simulation  run  was  M  =  100.  A  total 
of  100  independent  runs  were  performed  to  obtain  each 
experimental  Root  Mean  Square  Error  (RMSE)  value 
for  simulated  DOA  estimators.  The  results  were  ad¬ 
ditionally  averaged  over  the  number  of  sources.  When 
modeling  PR-JES,  the  randomly  generated  vectors  (16) 
were  renewed  in  each  simulation  run  (in  order  to  exam¬ 
ine  the  performance  averaged  over  the  random  choice 
of  weighting  vector).  The  SNR  was  defined  as  the 
ratio  of  signal-to-noise  powers  in  a  single  sensor.  In 
our  simulations,  we  always  assumed  that  q  =  q  and 
05  =  [3.5, 19.0].  Unknown  spatially  colored  Gaus¬ 
sian  noise  was  modeled  as  a  “spatially  autoregressive” 
process  [3]  with  the  covariance  matrix  [Q]/ib  = 
where  p  =  0.8.  The  beamspace  dimension  p  =  7  was 
assumed.  The  matrix  T  was  designed  within  the  sector 
05  using  the  so-called  “spheroidal  sequences”  approach 
[8],  _ 

Figs.  2  shows  the  DOA  estimation  RMSE’s  ver¬ 
sus  SNR.  Elementspace  spectral  fourth-order  MUSIC 
and  PR-JES  are  compared  with  their  beamspace  root 
implementations.  In  this  figure,  the  dimension  of  es¬ 
timator  bank  was  fixed  (iii  =  20).  Fig.  3  compares  the 
RMSE’s  of  these  fourth-order  methods  versus  the  di¬ 
mension  of  estimator  bank  K,  In  this  figure,  the  SNR 
was  fixed  and  equal  to  5  dB.  Both  figures  demonstrate 
that  PR-JES  performs  dramatically  better  than  MU¬ 
SIC  in  the  threshold  domain.  Interestingly,  for  known 
q  our  technique  allows  to  avoid  the  abrupt  threshold 
phenomenon  at  all.  Fig.  2  verifies  that  the  asymptotic 
(high  SNR)  performances  of  PR-JES  and  MUSIC  are 
similar.  From  Fig.  3  one  can  observe  that  the  per¬ 
formance  of  PR-JES  can  be  improved  via  increasing 


*  elementspace  spectral  higher-order  PR-JES 
o  beamspace  root  higher-order  PR-JES 


Figure  3:  Experimental  RMSE’s  of  DOA  estimation 
versus  K,  SNR  =  5  dB. 

the  dimension  of  estimator  bank.  As  expected,  the 
beamspace  root  implementation  of  fourth-order  MU¬ 
SIC  performs  better  in  the  threshold  and  asymptotic 
domains  than  elementspace  spectral  fourth-order  MU¬ 
SIC.  Similarly,  beamspace  root  PR-JES  has  better  per¬ 
formance  than  elementspace  spectral  PR-JES. 
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Abstract 

A  unifying  asymptotic  performance  analysis  of  a  class  of 
MUSIC  algorithms  for  direction-of-arrival  (DO A)  estima¬ 
tion  in  fourth-order  cumulant  domain  (FOCD-MUSIC)  is 
presented  in  this  paper.  A  simple  and  explicit  formula  for 
the  asymptotic  variances  of  DOA  estimation  by  FOCD- 
MUSIC’ s  is  given.  The  Cramir-Rao  bound  for  DOA  esti¬ 
mation  in  fourth-order  cumulant  domain  (FOCD-CRB)  is 
also  derived.  The  performances  of  three  typical  FOCD- 
MUSIC’s  and  the  conventional  covariance-based  MUSIC 
are  compared.  It  is  shown  that  the  FOCD-MUSIC  s  are 
inejficient  and  they  are  not  superior  to  the  conventional 
MUSIC  algorithm  in  any  case.  Nevertheless,  the  FOCD- 
MUSIC’s  outperform  the  conventional  MUSIC  with  re¬ 
duced  variances  and  improved  robustness  when  the  spa¬ 
tial  sources  are  closely  spaced  and  the  signal-to-noise 
ratios  (SNR’s)  are  relatively  low.  Simulations  are  in¬ 
cluded  to  validate  the  analytical  results. 

1.  Introduction 

In  recent  years,  DOA  estimation  algorithms  based  on 
higher-order  cumulants  have  drawn  a  lot  of  attention  [3-9] 
due  to  their  ability  to  improve  the  robustness  of  the  co- 
variance-based  techniques.  Pan  and  Nikias  [3]  were  the 
first  to  report  on  the  fourth-order  cumulant-based  MUSIC 
[2].  They  used  the  diagonal  slice  to  form  a  spatial  cumu¬ 
lant  matrix.  Porat  and  Friedlander  [4]  used  all  spatial 
fourth-order  cumulants  to  develop  a  MUSIC-like  algo¬ 
rithm.  Moulines  and  Cardoso  [5]  proposed  the  contracted 
quadricovariance  by  smoothing  part  of  the  spatial  fourth- 
order  cumulants.  Chen  and  Lin  [6]  presented  a  class  of 
fourth-order  cumulant  matrices  to  applied  to  MUSIC.  In 
[7]  we  developed  a  direction  finding  algorithm  based  on 
the  maximal  set  of  nonredundant  fourth-order  cumulants 
(MSNC  algorithm).  In  [8]  we  proposed  a  Toeplitz  ap¬ 
proximation  method  in  fourth-order  cumulant  domain 
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(CTAM  algorithm).  The  common  point  of  the  above  algo¬ 
rithms  is  that  the  true  (infinite  snapshots)  cumulant  matri¬ 
ces  are  similar  to  the  true  noise-free  covariance  matrix. 
The  differences  of  them,  which  result  in  different  perform¬ 
ance,  rely  on  the  way  to  compose  the  estimated  cumulant 
matrices.  We  notice  that  these  algorithms  can  be  put  into 
one  category,  that  is,  MUSIC  algorithms  in  fourth-order 
cumulant  domain  (FOCD-MUSIC’ s). 

Few  FOCD-MUSIC’ s  have  been  evaluated  analytically. 
Moulines  and  Cardoso  [5]  made  an  asymptotic  analysis  of 
the  algorithms  in  [3]  and  [5].  But  the  process  and  result  of 
their  analysis,  compared  to  the  those  presented  in  this  pa¬ 
per,  are  very  complicated.  Fan  and  Younan  [9]  also  made 
a  statistical  analysis  of  the  algorithm  in  [3].  Their  analysis 
is  based  on  an  assumption  that  the  sample  cumulant  errors 
due  to  finite  data  length  are  uncorrelated  random  variables 
with  identical  variance.  We  point  out  in  this  paper  that  this 
assumption  is  generally  not  true,  especially  when  the  spa¬ 
tial  emitters  are  spaced  closely  (e.g.,  less  than  half  of  the 
beamwidth). 

In  this  paper,  we  present  a  unifying  and  explicit  analy¬ 
sis  of  the  class  of  FOCD-MUSIC’s.  The  Cramer-Rao 
lower  bound  for  DOA  estimation  in  fourth-order  cumulant 
domain  (FOCD-CRB)  is  also  derived.  Both  the  analysis 
and  the  derivation  are  based  on  the  statistical  characteriza¬ 
tion  of  the  sample  fourth-order  cumulants,  which  is  also 
presented  in  this  paper. 

2.  Problem  Formulation 

Assume  P  narrowband  plane  waves  are  incident  on  an 
arbitrary  array  of  M  (M>P)  sensors.  The  output  of  the  mth 
sensor  of  the  array  can  be  written  by 

Vm  (0  =  S  (0p)  +  W„(t),t=l,...,N,  (1) 

p=\ 

where  N  is  the  number  of  snapshots,  Sp  (t)  is  the  complex 

envelope,  p  =  I . P;  w„(t)  is  the  additive  noise  and 

v„(e)  =  cxp{jke-r„],  k^and  are  wave  number  and 
sensor  location  vectors,  m  =  1 . M.  We  make  the  follow- 


ing  assumptions  on  the  array  outputs.  (ASl):  Sp{t)  ’s  are 

zero  mean  stationary  non-Gaussian  process  with  non-zero 
kurtosis;  (AS2):  ’s  are  zero  mean  circular  Gaussian 

process;  (AS3):Sp(t)'s  and  w^(t)’s  are  statistically  in¬ 
dependent  themselves  and  of  each  other.  Based  on  these 
assumptions  and  the  properties  of  cumulants,  the  true  sec¬ 
ond-  to  eighth-order  cumulants  can  be  easily  derived  as 
follows: 

p 

Ry(k,m)  =  Xr2,pVt(0p)v;(0p)  +  R„(k,m)  (2) 

p=\ 

P 

Qy{ki,k2,mi,m2)  =  '^YA.pVk,  i^p)%  (0p)v*,  (ep)v*^  (Op)  (3) 

p=\ 

P 

Q,  (*,•••  "*3  )= S  (e,  )•••  (0,  )v;;^  (0^  >••  v;;^  (0^ ) 

p^\ 

(4) 

P 

P=1 

(5) 

The  sample  fourth-order  cumulant  can  be  estimated  by 
C!!y{ki,k2,mi,m2)  =  -jXy*.  (') 

+ S[  >”*1 

r,K=l 

+  yt,(Oy^(t)yii^(u)y^(u)  ] 

3.  Asymptotic  Characterization  of  Sample 
Fourth-Order  Cumulants 


The  sample  cumulant  error  due  to  finite  snapshots  is  de¬ 
fined  by 

^^4y  (k^ » ^2 » » ^2  )  ^4y  Ckf  *  ^2  ’  *  ^2  )  ^4y  (^1  >  ^2  »  >  ^2  ) 

(7) 

Theorem  1.  Under  (AS1)-(AS3),  the  estinnate  of  fourth- 
order  cumulant  (6)  is  asymptotically  unbiased.  The  asymp¬ 
totic  covariance  of  the  sample  cumulants  is  given  by 

^  (*i » .  "»i .  '”2  )^C'4^  (^3 .  *4 .  % .  »l4  ) } 

=  Gg  -h  Ggj  +  G44  H-  G422  +  G2222  ' 

(8a) 

that  is 

AE{AC^y  (fc, ,  *2 .  "*1 .  )AC4y  (*3 ,  *4 ,  ,  /n4 ) } 

(oD) 

=  (Gg  +  G^2  +  ^44  +  G^422  +  ^^2222)  !  ^ 
where  “AE”  denotes  asymptotic  mean;  Gg,  G62,  G44,  G422, 
and  G2222  are  functions  of  cumulants: 

Gs  =  Q);(^i,A:2,A:3,A:4,mi,/n2,/n3,/n4); 

^62  = 

Qy  (/:i ,  ,  ^3 ,  mi ,  m3 , 7714  )/2y  (^4 ,  m2 )  +  (^1 ,  ^ , /n2 ,  m3 ,  m4 ) /^  (^4 ,  mi ) 


+Qy(^i»^»A:4,mi,m3.m4)J^(A:3,m2)+C6y(*i,^,^4,m2,m3,m4)/?^(^,mi) 

G44  =C43,(A:i,fc2,/ni,/n3)C4^(^3,^4,m2,m4) 

■^C4y(A^i»  A^jmi,m4)C4ji  (A3,  ^4,  m2,  m3)  +  C^y{k^^k/^yTn2ytfi2^C^y(Jo^yk^yni^yiti^) 
+Qy  (^1 ,  A:2  » "*2 » "^4  )Qy  (^»  *4 » "h  >  ^  +  Qy  (A^ .  *2 » ^”3 » "*4  )Qy  (^ » ^4 » "*1  >  "*2  ) 
+C4y(A„A3,mi,m2)C4y(A^,A4,m3,m4)  +  C4y(A„^,mi,m3)C4y(A2,A4,m2,m4) 
"^Cijy  (A^i ,  A^ ,  Wj ,  m4 1^4  (Aij ,  A4,  m2 ,  m3  )  +  C^y{)^ ,  ^ ,  m2 ,  m3  )C4y  (A2 ,  A4 ,  mj ,  m4 ) 
+Qy(^i»  A^>  ^4)Qy(^»  A:4,mi ,  m3)  +  C4y(Ai ,  A3 ,  m3,  m4  )C4y(A2 ,  A4,  mi,  m2 ) 
*^^4y(A^i’ A^4,mi,m2)C4y(A2,  A3,  m3,m4)  +  C:4y(Ai,  A4,mi,m3)C4^(A^,  A3,m2,m4) 
‘^^4y(^i»A^4’  ^i’^4)^4y(A^»  ^4y(A^i»^4>f^»m3)C4y(A^,  A^,mi,m4) 

'^^4y(A^i,  Ar4,m2,m4)C4^(A^,/^,mi,m3)  +  C4y(Ai,  A4,m3,m4)Cr4^(A2,  A^,mi,m2)j 

^422  == 

Cjy  (A3,  A4  ,mi ,  m2)/?y(A|,  m3)/?y(^,m4)  +  C4y(A3,  A4,  mi  ,m2)/?y(Ai  ,m4)^y(/^  ,7713) 
+C4y  (Ai,  Aj ,  /713 ,  m4  )Ryik^ ,  mi  )Ry{k^ ,  7712 )  +  C4y  (Ai ,  A2 ,  m3 ,  m4  )/?y  (A3 ,  twj  )/?y  (A4,  mi ) 
+Qy(Ai,A3,771i,77l3)/fy(A2,m4)/fy(A4,77l2)+C4y(Ai,^,m2,m4)/ey(A2,77l3)/?y(A4,77ti) 
+C4y(Ai,A3,mi,m4)/?y(A2,77l3)/ey(A4,77l2)+C4y(Ai,A3,77l2,77l3)i?y(A2,m4)i?y(A4,77li) 
+C4y  ( A2 ,  A4 ,  mi ,  m3  )Ry  (Ai ,  m4  )/?y  (A3 , 7712  )  +  Qy  (A2 ,  A4, 7772,  m4  )^y  (A^l » ^3  )/fy  (A3 ,  mj  ) 
+C4y(A2,A4,m„m4)/?y(Ai,m3)/?y(A3,77i2)  +  C4y(A2,  A4,77i2,m3)/?y(A,,m4)/?y(A3,mi) 

+C4y  (Ai,  A4 ,  mi,  m3  )/?y(A2 ,  m4  )/?y  (A3,  tttj  )  +  C4y(Ai,  A4 ,  mj ,  m^)Ry  (k^ ,  m3  )/?y(A3 ,  mi ) 
+C4y  (Ai,  A4,  mi ,  m^)Ry{k2 ,  m3  )/?y  (A3 ,  tttj  )  +  C4y  (Aj ,  A4 ,  tttj,  m^)Ry  (A2,  m4  )/fy  (A3,  mi ) 
+C4y(A2,A3,mi,m3)/?y(Ai,m4)/?y(A4,77J2)  +  C4y(A2,A3,77i2,m4)i?y(Ai,m3)J?y(A4,m,) 
+C4y(A2,  A3,mi,m4)/fy(Ai,m3)/?y(A4,7772)  +  C4y(A2,  A3, 7772, 77i3)/?y(Ai,m4)/^(A4,  mi); 
^2222  = 

/^(Aj,  mi)^2y  (A4,  7772)/^(Ai.  7773)/ey(A2,m4)+ (A3,mi)/j;y  (A4, 77l2)/^(A2, 7773)/?y(Ai,  m4) 
+/?y(A4 ,777i)«y  (A3, 7772)/fy(Ai ,  TTia  )/ey  (A2 ,  m4 )  +  Ry(k^,mi  )/fy(A3, 7772)/fy  (A2  ,my)Ry(kiym^ ) 

where  /?/...),  C4y(...),  Qyi.,.)  and  C8y(...)  are  second-  to 
eighth-order  cumulants  of  y(0,  as  shown  in  (2)-(5). 

Proof,  The  derivation  of  (8)  is  somewhat  lengthy  but  it 
is  straightforward  by  using  (6)  and  the  moment-to- 
cumulant  formulas  [1],  I 

Remark:  The  statistical  characterization  of  sample 
fourth-order  cumulants  of  a  single  realization  of  random 
process  was  discussed  in  [10]  and  [11].  In  [4]  an  approxi¬ 
mate  expression  for  the  covariance  of  the  sample  fourth- 
order  cumulants  of  multiple  realizations  (snapshots)  of  the 
process  used  in  array  processing  was  derived  But  the  ex¬ 
pression  is  a  function  of  higher-order  moments.  The  result 
presented  in  this  paper  is  a  function  of  higher-order  cumu¬ 
lants  and  is  convenient  for  computing  in  most  applications 
such  as  array  processing. 

It  is  seen  form  the  theorem  that  the  sample  cumulants 
can  not  generally  be  regarded  as  uncorrelated  and  equi- 
variace  random  variables.  However,  the  joint  distribution 
of  sample  cumulants  is  asymptotically  Gaussian  (not  circu¬ 
lar  in  general)[ll,  12]. 

4.  Asymptotic  Performance  of  FOCD-MUSIC 

Let  C  be  the  estimated  cumulant  matrix  used  in  FOCD- 
MUSIC.  The  true  cumulant  matrix  has  a  general  form  of 
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C  =  BIB”  (9) 

where 

X  =  diag[y4,i,-".y4,/>]>  (10) 

y4  p=Cum{  Spit),Sp(t),s*p(t),s*p(t)  ]  (11) 

B=[b{ei),-M0p)]  (12) 

He)=[bi(e),-,bM^(e)f-  (i3) 

b(6 )  is  the  steering  vector  in  cumulant  domain  and  is 
the  dimension  of  the  cumulant  matrix,  which  depend  on 
the  cumulant  matrix  specified  by  the  algorithm. 

Theorem  2.  The  FOCD-MUSIC  DOA  estimator  is  as¬ 
ymptotically  unbiased.  The  asymptotic  variance  oi  9 p  is 


given  by 
AE{(d0p)^}  = 


®ulf  AE{  AC  ®  dC*  }(«,,,  0  ) 

(A,.-A,.)(A,^-A,^) 


{ep,u,) 


(14) 

where  Mp  =  6p-9p  ,  <B>  denotes  Kronecker  product, 
/L,  ’s  and  «,•  ’s  are  eigenvalues  and  the  corresponding  nor¬ 
malized  eigenvectors  of  C  ,  with  I  A,  I  in  decreasing  order 
( A,  =0 for  i=  P+1, ...  M),  Up],  and 

AC  =  C-C,d(ep)  =  H0p)  (15) 

Gp,,  =-d\ep)u”b{ep)-b\ep)u^d{ep)  (i6) 

5(0p.E,)  =  2rf"(0,)(/«^ -Vp^mOp)  (17) 


Proof.  The  derivation  of  (14)  require  some  approxima¬ 
tions  on  the  perturbed  signal  eigenvectors  [13].  The  details 
of  the  derivation  vidll  be  given  in  another  paper[8]  and  are 
omitted  here  due  to  space  limitation.  I 

By  using  theorem  1  the  computation  of  (14)  is  straight¬ 
forward  when  an  algorithm  of  FOCD-MUSIC  is  specified. 


5.  Cramer-Rao  Bound  in  Fourth-Order 
Cumulant  Domain 

The  asymptotic  Cramdr-Rao  lower  bound  for  cumulant- 
based  DOA  estimation  may  be  determined  in  cumulant 
domain  since  the  sample  cumulants  are  asymptotically 
Gaussian  distributed  [12].  We  choose  all  of  the  nonredun- 
dant  sample  cumulants  as  the  “observations”  in  cumulant 
domain  without  loss  of  information. 


Theorem  3.  The  maximal  set  of  nonredundant  sample 
cumulants  is  given  by  ^ : 

^={C4yiki,k2,mi,m2)\iki,lc2,mi,m2)  ^  (18) 

{(kx,k2,mi,m2)\ki,k2,mi,m2  satisfy  CONDS] 

(19) 

where  the  CONDS  is  given  by: 

fkj ,  mj  e  [1,  M] ;  ^2  ^  1  !  ”*2  ^  [)>  )  ^20) 

[(/Hj  -l)mi  H-2m2  ^  (ki  -l)ki  +2^2 
the  number  of  elements  in  ^  is  Q, 

Q  =  M(M  +  1)(M‘^ +M  +  2)/%.  (21) 


Proof  The  result  can  be  obtained  by  examining  the 
symmetry  of  the  fouith-order  cumulants.  (20)  is  a  revised 
version  of  that  in  [7].  See  [7]  for  more  details.  I 

Let  7:  We  stack  the 

A 

nonredundant  sample  cumulants  in  a  vector  ^ 

|  =  (22) 

Since  the  sample  cumulants  are  not  circular  Gaussian  in 
general,  we  create  a  2Q-dimensional  real  Gaussian  vector 

C  >  C  -  white  additive  Gaussian 

noise,  parameters  to  be  estimated  in  cumulant  domain  are: 
DOAs  O^iOp},  signal  cumulant 

Y  =  {Y2.p,y4,p,Y6.p,Ys,p),  P=i . P’  and  noise  power 

al,  .  Then  the  Fisher  information  matrix  (FIM)  [14]  is 


given  by 


/  = 


j0.e 

je.r 

j6,al 

jr.ol 

(^je,alyT 

_2  -2 

(23) 


where 


dRr 

JJi  ^  V 

dYi  ^  dYk  '7  ‘  -^v.  4  <7v. 


dYi  ^ 


dR. 


joUl  — 4-J?, 


dal 


2  "C 

jf.r  i  +  ltr{«->  ^Rf  (27) 

'•*  dOi  '•  dYk  2  ^  ddi  4  dYk 

jB.al,  _  1  »_f p-1  p-1  /'28'i 

2  1  -  dRr  .  dRy 

(29) 


(26) 
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where  7?^^  is  the  element  at  /th  row  and  Jtth  column  of 
a  a 

J  ’  ,  and  so  on.  It  is  easy  to  find  that 


Rr=- 

f  2 


,  (30) 


where 

Rj,  =  AE{d|d|«}  =  [AEMC4"  (T„  )(dC4^(T,^  ))*}]e,e 


(31) 

=  [AE{4C4^^(t,,  )AC^y{x^^  ))]q,q 


(32) 

Finally,  we  get  the  asymptotic  Cram6r-Rao  lower  bound 
for  DOA  estimation  in  fourth-order  cumulant  domain 
(FOCD-CRB): 

AE{A0p}pocD-cRB  -Jp!p  >  P  =  (33) 

where  represents  the  element  at  pth  row  and  column 

of  .  Using  the  result  of  theorem  1  the  FOCD-CRB  can 
be  evaluated  numerically. 


6.  Simulations 


Simulation  experiments  were  performed  on  a  uniform  lin¬ 
ear  array  (ULA)  separated  by  half  a  wavelength  of  the 
narrowband  signals.  Two  emitters  (P=2)  broadcast  BPSK 
waveforms  from0j  s=10®  and  =10°  with  respect  to  the 
broadside  of  the  array.  The  additive  noise  is  white  Gaus¬ 
sian.  Three  algorithms  within  the  class  of  FOCD-MUSIC, 
named  MUSIC-like[4],  MSNC[7]  and  CTAM[8],  were 
evaluated  and  compared  to  the  conventional  MUSIC[2] 
and  FOCD-CRB  by  numerical  theory  results  and  Monte 
Carlo  simulations.  Each  of  these  simulations  is  based  on 
1(X)  independent  runs.  Shown  in  Fig.  1  are  the  estimation 
mean-square-errors  (MSB’s)  of  0j  versus  signal-to-noise 

ratios  (SNR’s).  The  experiment  condition  is:  number  of 
sensors  M  =  4;  number  of  snapshots  N  =  500.  The  close 
agreement  between  theoretical  predictions  and  simulation 
results  is  clearly  evident  when  the  SNR  is  above  5  dB,  and 
good  for  CTAM  algorithm  [8]  even  when  SNR  is  as  low 
as  -5  dB.  The  MSNC  algorithm  [7]  is  almost  of  no  differ¬ 
ence  with  the  MUSIC-like  algorithm  [4]  while  the  former 
is  less  computationally  expensive  than 'the  latter.  It  can  be 
seen  that  the  FCX^D-MUSIC’s  outperform  the  conven¬ 
tional  MUSIC  [2]  only  when  the  SNR  is  somewhat  low 
(less  that  10  dB  approximately).  Fig.  2  and  Fig.  3  plot  the 
MSB’s  versus  the  number  of  snapshots  (N)  while  the  num¬ 
ber  of  sensors  (Af)  varies  from  4  in  Fig.  2  to  6  in  Fig.  3 
(which  means  the  relative  spatial  separation  between  the 
two  emitters  is  increased).  Again,  the  analytical  predic¬ 
tions  compare  favorably  with  the  simulation  results  when 
the  number  of  snapshots  is  not  quite  low  (greater  than  100 
approximately).  It  is  shown  that  the  FOCD-MUSIC’s  out¬ 


perform  the  conventional  MUSIC  when  spatial  emitters 
are  rather  closely  spaced  (less  than  half  of  the  beamwidth 
approximately).  This  is  also  shown  by  Fig.  4,  where  the 
MSB’s  are  versus  the  variation  of  the  difference  of  spatial 
angles  of  the  two  emitters. 

7.  Conclusions 

A  unifying  asymptotic  analysis  of  FOCD-MUSIC  was 
presented  and  the  FOCD-CRB  was  also  derived.  Simula¬ 
tion  experiments  validated  the  analytical  results.  Three 
typical  FOCD-MUSIC’s  were  evaluated  and  compared  to 
covariance-based  MUSIC  and  FOCD-CRB.  The  MUSIC- 
like  and  MSNC  have  an  almost  identical  performance  as 
predicted  in  [7].  CTAM  behaves  rather  robustnessly  in  all 
scenarios.  The  FOCD-MUSIC’s  are  not  superior  to  the 
conventional  MUSIC  algorithm  in  all  cases.  Nevertheless, 
they  outperform  the  conventional  MUSIC  with  reduced 
variances  and  improved  robustness  when  the  spatial 
sources  are  closely  spaced  and  the  signal-to-noise  ratios 
(SNRs)  are  relatively  low.  The  FOCD-MUSIC’s  are  inef¬ 
ficient.  It  can  be  seen  from  Fig.  2  and  Fig.  3  that  the  esti¬ 
mation  covariances  decrease  parallel  with  the  FOCD-CRB 
while  the  number  of  snapshots  goes  large,  which  means 
that  the  FOCD-MUSIC’s  are  not  asymptotically  efficient. 
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Fig.  1  Mean-square-eirors  (MSB’s)  versus  signal-to-noise 
ratios  (SNR’s)  for  0i  =  10°.  M  =  4,  N=500. 


Fig.  3  Mean-square-errors  (MSB’s)  versus  number  of 
snapshots  (N)  for  6i  =  10°.  Af  =  6,  SNR  =  5  dB. 


Fig.  4  Mean-square-errors  (MSB’s)  versus  the  angle  dif¬ 
ferences  of  the  two  emitters  for  0i  =  10°.  M  =  4,  N=500, 
SNR  =5  dB. 


Fig.  2  Mean-square-errors  (MSB’s)  versus  number.of 
snapshots  (N)  for  9\  =  10°.  A/  =  4,  SNR=  5  dB. 
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Abstract 

The  problem  of  separation  of  mixed  sources  is  adressed 
in  this  paper.  To  solve  this  problem,  at  least  as  many  ob¬ 
servations  as  sources  are  needed.  In  particular,  the  number 
of  sources  can  be  unknown.  The  separation  system  is  a  lin¬ 
ear  network  updated  with  a  stochastic  descent  algorithm  to 
minimize  some  separation  criterion.  A  first  algorithm  sep¬ 
arates  sources  with  positive  kurtosises  while  a  second  one 
separates  sources  with  negative  kurtosises.  For  both,  the 
performances  are  independent  of  the  mixture.  Besides,  in 
the  noisy  case,  when  there  are  more  sensors  than  sources, 
the  additional  outputs  merely  generate  noise. 


1.  Introduction  and  problem  statement 

The  unsupervised  separation  of  several  (p)  independent 
zero-mean  sources  s,  contained  in  several  (m)  observed  lin¬ 
ear  mixtures  xj  has  become  a  classical  problem  in  signal 
processing.  It  has  applications  in  mobile-radio  communica¬ 
tions,  array  processing,  biological  signal  processing,  feature 
classification,  etc...  In  most  previous  contributions,  it  is 
assumed  that  the  number  p  of  sources  has  already  been  es¬ 
timated  [l],[3],[5]-[7],  [13].  In  [4],  the  case  of  an  unknown 
p  is  considered  but  only  in  the  noiseless  context. 

In  [12]  we  have  proposed  an  ad^tive  algorithm  which 
can  separate  an  unknown  number  of  sources,  thanks  to  some 
optimality  criterion.  But  the  results  are  restricted  to  sources 
with  negative kurtosis.  In  this  contribution,  starting  with  the 
same  criterion,  we  derive  two  other  algorithms  respectively 
efficient  with  sources  having  positive  or  negative  kurtosis. 
They  both  have  satisfactory  achievement  in  the  noisy  case. 

The  mixture  model  under  investigation  is 

X  =  As  w  (1) 

where  s  =  *  i  (xi, . .  A  is  the 

mx  p  mixture  matrix.  The  m  x  1  vector  w  accounts  for 


the  presence  of  a  zero-mean,  Gaussian  noise  vector  that  is 
assumed  independent  of  the  source  vector.  The  problem  is 
unsupervised  when  s,  A  and  w  are  all  unknown  and  only 
X  is  observed.  It  is  an  inherent  feature  of  this  problem  that 
the  order  of  sources  s,-  cannot  be  recovered.  We  assume  that 
the  rank  of  A  is  p,  which  implies  that  m>  p.  Consider  a 
separation  system  with  q  outputs,  where  q  is  some  a  priori 
overestimate  of  p  (g  >  p).  The  output  vector  is 

y  =  iyi,-..,ygf  =  H'^x.  (2) 

where  the  m  x  g  matrix  H  is  characterizes  the  separating 
system.  Denoting 

C  i  A^H  (3) 

the  p  X  g  matrix  of  the  over-all  system  (mixture  followed  by 
separation  system),  the  output  is 

y  =  c'^s.  (4) 

Clearly  separation  is  achieved  iff 

C  =  A[J  Q]P  (5) 

where  P  is  an  arbitrary  q  x  q  permutation  matrix.  The 
bracket  means  concatenation  of  the  p  x  p  invertible  diagonal 
matrix  A  and  the  p  x  (g — p)  matrix  Q  whose  columns  contain 
at  most  one  non  zero  entry.  Equation  (5)  means  that  all  of 
the  p  sources  s,-  are  recovered  on  at  least  one  output  yj  with 
arbitrary  non  zero  gain  and  arbitrary  order,  the  other  outputs 
being  additional  copies  of  certain  sources. 


Figure  1.  Overall  system:  mixture  matrix  fol¬ 
lowed  by  separating  system 
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2  Separation  criterion  and  algorithms 

The  separation  criterion  is  to  minimize 

J{H)  =  Y,  }  -  2aLn(detff’ff)  (6) 

j=i 

where  <r  and  a  are  two  scaling  positive  quantities,  as  already 
proposed  in  [12].  The  first  part  of  this  criterion  is  a  sum 
of  simple  extracting  criteria  [11]  associated  to  one  output. 
The  second  term  forces  the  rank  of  the  matrix  H  to  be 
equal  to  q  by  avoiding  the  case  where  H'^H  is  singular.  In 
the  noiseless  case,  it  can  be  shown  that  under  some  mild 
condition,  the  separating  points  in  H  are  forced  onto  the 
minima  of  the  criterion  (6). 

The  stochastic  gradient  of  the  criterion  (6)  reads 

V/  =  ®y^diag(yy^)  -  <r*y’’  -  (7) 

where  diag(yy’’)  =  diag(y?, . . . ,  y^).  The  increment  of 
the  associated  adaptive  algorithm  is 

AoH  =  -fiVJ  ,fi>0  (8) 


3.  Stationary  points 

The  stationary  points  H  of  the  two  algorithms  (11)  are 
those  of  the  associated  ordinary  differential  equations  (ODE) 

^=,-E{HSr{y)},r=l,2.  (14) 

at 

It  is  interesting  to  investigate  the  corresponding  transformed 
ODE 

i£  =  -E{CSr{y)},r  =1,2.  (15) 

at 

Separating  stationary  points  When  <t  =  1,  it  can  be 
shown  that  the  separating  stationary  points  of  (15)  are 

C  =  A[I^P  (16) 

where  _  _  _ 

•  A  =  diag(Ai, . . . ,  Ap),  and  A,-  is aroot  of 

P{anA^-P{a?}A?-^  =  0,  (17) 


where  y  is  a  (small)  step-size.  The  computation  of  (7)  is 
awkward  since  it  requires  to  invert  the  matrix  H^H .  In  [2] 
and  [1],[5],  two  different  stochastic  descent  algorithms  have 


been  proposed  with  respective  increments 

AifT 

=  -pHVJ'^H 

(9) 

A2H 

(10) 

It  follows  from  (7)  that 

ArH  = 

-pH8r{y)  ,  r  =  1,2 

(11) 

where 


•  rj  denotes  the  number  of  supplementary  times  where  the 
source  s,  is  extracted.  The  r,-  only  depend  on  initialization 

and  satisfy  »■•  =  9  Z.P' 

•  the  p  X  (g  -  p)  matrix  Q  has  the  particular  form 


Q  = 


O-O  1--1 


0 


(18) 


\  0 


<5i(y)  =  Hvf  =  diag(yy^)  yy^  -  (ryy’^  -  al  (12) 


when  9  >  p.  It  is  empty  when  g  =  p. 


Clearly  Ai  and  A2  require  less  computation  than  Ao  since 
the  inversion  of  H^H  has  disappeared.  Moreover,  the  in¬ 
crements  (11)  show  an  "equivariant"  property  according  to 
the  definition  in  [5]:  in  the  noiseless  case,  the  separation 
performances  do  not  depend  on  the  mixture  A.  This  can 
be  seen  on  the  so-called  "transformed"  algorithm  that  is  ob¬ 
tained  for  the  overall  system  C  by  premultiplying  eq.  (1 1) 
with  .  The  corresponding  increment  reads 

ArC  =  -pC^r(y)  ,  r  =  1,2  (13) 

which  only  involves  the  overall  system  C  and  its  output 
y,  but  neither  A  nor  x.  Therefore  the  separation  perfor¬ 
mances  are  not  affected  by  a  possible  ill-  conditionning  of 
A,  contrary  to  the  gradient  increment  (8). 


Clearly  a  source  may  be  extracted  more  than  once.  How¬ 
ever,  all  the  sources  are  extracted  at  least  once.  For  both 
algorithms  (15)r=i.2,  the  study  of  stability  for  a  sepa^ing 
stationary  point  provides  a  condition  on  the  source  statistics, 
but  its  expression  is  somewhat  intricate.  Fortunately,  some 
sufficient  (unnecessary)  stability  conditions  can  be  stated 
easily: 

Result  1:  If  all  the  sources  have  positive  (resp.  negative) 
kurtosis  the  algorithms  AiC  (resp.  A2C)  is  stable  at  the 
separating  stationary  point  (16) 

where  the  kurtosis  of  a  source  Sj  isK,-  =  E{af}-ZE^{aj}. 
Moreover,  separation  of  sources  whose  kurtosises  have  dif¬ 
ferent  signs  may  be  possible  in  some  cases. 
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Non>separating  stationary  points  Both  ODE  (15)  ex¬ 
hibit  non-separating  stationary  points  whose  expressions  are 
not  explicited  in  this  paper.  However,  they  are  unstable 
when  the  kurtosis  condition  of  result  1  holds. 


4  Behaviour  of  the  algorithm  in  H 

In  fact,  the  behaviour  of  the  transformed  algorithms  (13) 
in  C  does  not  always  describe  correctly  the  behaviour  of 
the  direct  algorithms  (11)  written  with  H.  This  depends  on 
whether  m  =  q  =  p  or  m>q>p,  and  whether  the  system 
is  affected  or  not  by  some  noise. 


The  case  m  =  q  =  p  In  this  case  A  is  invertible.  Hence 
a  one  to  one  correspondance  between  algorithms  (1 1)  and 
algorithms  (13).  Therefore,  the^nvergenw  of  C  to  C 
implies  the  convergence  of  H  to  H  =  A~^C. 


The  case  m>  q  >  p 

•  The  noise-free  case.  We  have  shown  that 

AH  =  MAC-\^N.  (19) 

where  M  is  an  m  x  p  constant  matrix  and  iV  is  an  m  x  g  non 
zeromean  stochastic  matrix.  Consequently,  E{AC}  =  0 
does  not  imply  E{AH}  =  0.  In  other  words,  although  the 
separating  stationary  state  is  reached  by  C,  the  matrix  H 
still  wanders  and  never  reaches  any  stationary  point. 

There  exists  a  linear^ubspace  W  of  ^mensio^m  -  p) 
in  which  the  matrices  H  are  such  that  C  =  A^H.  Since 
the  algorithms  (1 1)  minimize  -Ln(detH’iT),  this  l^arithm 
will  reach  -oo.  So  H  wandCTS  in  7i  until  detH'^H  gets 
infinite.  Consequently,  the  entries  of  H  are  divergent. 


•  The  noisy  case.  The  m  noise  components  assumed  in¬ 
dependent  can  be  viewed  as  other  sources.  Therefore,  the 
new  number  of  independent  sources  is  p'  =  p  +  m  and  the 
condition  m  >  p'  no  longer  holds.  Then  perfect  separation 
of  the  p'  sources  is  impossible.  However,  by  considering 
a  low  noise  power,  it  can  be  shown  that  after  convergence, 
the  separating  system  outputs  read  (up  to  a  permutation  of 
indices) 


Si  -i-ejs  -t-eJiB 
w s 


i=  l,...,p 
i  =  p+l,...,q 


(20) 


where  7  denotes  a  vector  and  ei ,  £2  >  «3  three  vectors  with 
small  entries.  Hence  a  quasi-separating  stationnary  state. 
Note  that  when  m>  p  increases,  the  norm  of  e  decreases. 


5.  Simulations 

In  order  to  confirm  the  theoretical  results,  two  simulations 
have  been  run  with  p  =  2  sources,  m  =  q  =  4  sensors,  and 


for  a  mixture  matrix 


(1.0  0.5  \ 
0.4  1.0 
0.9  0.4 
0.3  1.1  / 


(21) 


The  noise  components  w,  are  zero-mean,  independent  Gaus¬ 
sian  variables  with  power  —  0.01.  Simulation  1  (resp. 
2)  corresponds  to  the  increment  AiH  (resp.  A2H)  and 
uses  2  sources  with  positive  (resp.  negative)  kurtosis  whose 
distribution  is  depicted  in  Fig.2(a)  (resp.  in  Fig.2(b)).  The 
histograms  of  the  mixtures  Xj  are  respectively  depicted  in 
Fig.3  (resp.  in  Fig.6).  The  4  output  signals  yj  are  plot¬ 
ted  in  5  and  8  respectively,  which  show  that  separation  is 
reached  after  approximatively  2000  iterations  for  the  two 
algorithms.  The  histograms  of  the  output  signals  are  de¬ 
picted  in  Fig.4  and  Fig.7  respectively.  In  simulation  1  the 
sources  are  recovered  by  outputs  (j/i,j/4)  while  in  simula¬ 
tion  2  the  sources  are  recovered  by  outputs  (3/2 .2/3)).  The 
residual  noise  components  are  noticeble  in  the  histogram 
of  the  recovered  sources.  Besides,  the  histograms  of  the 
remaining  outputs  exhibit  the  characteristic  Gaussian  shiq)e 
corresponding  to  a  mixture  of  noise  components.  As  a  re¬ 
sult,  a  rough  a  priori  knowledge  about  the  sources  statistics 
makes  them  easy  to  detect  in  number  and  value. 


6.  Conclusion 


Two  new  simple  adaptive  separation  algorithms  are  pre¬ 
sented  in  this  paper.  Both  are  derived  from  the  same  opti¬ 
mization  criterion.  They  both  work  with  an  unknown  num¬ 
ber  of  sources.  The  first  one  separates  sources  with  positive 
kurtosises  while  the  second  one  works  with  sources  having 
negative  kurtosis.  For  a  low  additive  Gaussian  noise,  each 
source  is  recovered  once.  If  there  are  more  sensors  than 
sources,  the  other  outputs  recover  some  combinations  of 
the  noise  components.  Therefore,  the  number  of  sources  is 
easily  estimated. 
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Figure  3.  Simulation  1 : 
tures  xj 
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Figure  2.  Probability  distribution  functions  of 
the  two  sources:  (a)  with  positive  kurtosis 
Ki  =  2  (b)  with  negative  kurtosis  k,  =  -1.36 
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ABSTRACT 


mixture  of  M  source  signals  3m{k),  {m  =  1,2, M)  : 


This  paper  is  concerned  with  the  problem  of  near-field 
source  localization.  The  problem  is  tackled  using  the  rnethod 
of  blind  separation  of  independent  signals  (sources)  from 
their  linear  instantaneous  ( memoryless)  mixtures.  The  var¬ 
ious  signals  are  assumed  to  be  zero-mean  non- Gaussian 
but  not  necessarily  linear  or  i.i.d.  Approaches  using 
higher- order  cumulants  are  developed  using  the  fourth- order 
normalized  cumulants  of  the  "beamformed”  data.  The  in¬ 
stantaneous  mixture  matrix  is  obtained  by  cross- correlaHng 
the  extracted  inputs  with  the  observed  outputs.  The  first 
approach  is  source- extractive,  i.e.,  the  sources  are  extracted 
and  cancelled  one-by-one.  The  other  approach  is  based  upon 
cumulant  matching  of  the  estimated  and  model-based  curnu- 
lants  parametrized  by  the  source  parameters  (range,  bearing 
and  cumulant).  Illustrative  simulation  examples  are  pro¬ 
vided. 


1.  INTRODUCTION 

This  paper  is  concerned  with  the  problem  of  near-held 
source  localization  using  blind  separation  of  independent 
signals  (sources)  from  their  linear  instantaneous  (memo- 
lyless)  mixtures  The  problem  consists  of  extracting  the 
sources  without  any  distortion  from  the  observation  of  un¬ 
known  linear  mixtures  of  the  unknown  sources.  Such  a 
model  arises  in  a  wide  variety  of  array  processing  apph- 
cations.  In  this  paper  we  study  the  problem  of  near-neld 
localization  via  blind  source  separation.  Prior  work  on  this 
problem  includes  [l]  and  [2].  General  source  separation 
problem  has  been  considered  in  [5]- [7], [11]. 

It  has  been  pointed  out  in  [1]  and  [2]  that  the  second- 
order  statistics  based  methods  (such  as  that  in  [9],  and 
others  (see  [1],[2])  )  are  computationally  complex  requir¬ 
ing  nonhnear  optimization  whose  convergence  to  the  de¬ 
sired  solution  is  not  guaranteed.  Moreover,  they  yield  bi¬ 
ased  estimates  in  spatially  correlated  Gaussian  measure¬ 
ment  noise.  In  [1]  and  [2],  two  methods  have  been  pro¬ 
posed  for  near-held  source  localization.  One  of  them  is 
a  search-free  solution  (TLS-ESPIRIT)  and  the  other  is  a 
cumulant-matching  based  solution  which  requires  nonun¬ 
ear  optimization  and  which  is  initialized  using  the  TLS- 
ESPIRIT  solution  for  reliable  convergence.  In  this  paper 
we  investigate  alternatives  to  [l]-[2].  Use  of  blind  separa¬ 
tion  for  near-held  localization  has  been  suggested  in  [4]. 

2.  SYSTEM  MODEL 

Given  measurements  yi{k),{i  =  1,2,  at  time  k  sX  N 

sensors,  let  these  measurements  be  a  linear  instantaneous 

This  work  was  supported  by  the  National  Science  Founda¬ 
tion  under  Grant  MIP-9312559. 


M 

yi{k)  ~  fimSm{k)  +  ni(fc),  I  =  1,  2,  ...,  N, 

m=l 

y(fc)  =  Fs(fe)  +  n(A!), 


(1) 

(2) 


where  y(k)  =  [yi{k):y2{k)\  ■  ■  ■:yNik)f ,  similarly  for 
and  n(fc),  Sm{k)  is  the  m-th  input  at  sampling  time  k,  yi{k) 
is  the  i-th  output,  ni{k)  is  the  additive  Gaussian  measure¬ 
ment  noise,  and  fim  is  the  scalar  memoryless  transfer  func¬ 
tion  with  sm{k)  as  the  input  and  yi(k)  as  the  output.  We 
will  caU  the  matrix  F  (with  im-th  component  fim)  the  inix- 
ing  matrix.  We  allow  all  of  the  above  values  to  be  complex¬ 
valued.  The  vector  sequence  s(k)  is  assumed  to  be  zero- 
mean  and  spatially  independent  (i.e.  various  source  signals 
are  independent).  Also  assume  that  the  fourth-cumulant  or 
the  third-cumulant  of  5m (A)  is  nonzero  Vm.  In  blind  source 
separation  problems  the  mixing  matrix  F  is  unknown  and 
the  problem  is  to  recover  s{k)  (upto  scaling)  given  data. 

Consider  a  uniform  linear  array  with  inter-element  dis¬ 
tance  d  and  number  of  sensors  N.  The  sensors  are  as¬ 
sumed  to  be  omnidirectional  and  identical,  indexed  as 
1,2,...,^.  The  impinging  signals  arrive  from  M  narrow- 
band,  co-channel  sources  with  center  frequency  wo  rad/sec 
and  wavelength  A.  Let  the  m-th  emitter’s  signal^  arrive  at 
an  angle  Om  w.r.t.  the  array  broadside  and  let  this  emitter 
be  at  a  range  Tm,  both  these  parameters  being  measured 
w.r.t.  the  sensor  ‘1’.  Under  these  assumptions  fim  is  given 


by  [1],[2],[9] 


fim  ==  eXp{jPim} 


(3) 


=  Wmii  -  1)  +  -  1)" ,  (4) 

Wm  :=  —  2^  Y  sin(0m)i  - COS^(5rn)-  (5) 

A  A7*m 

The  near-held  localization  problem  consists  of  estimation 
of  the  m-th  source  azimuth  angle  6m  and  range  Vm  given 
the  observation  vector  y{k),k  =  1,2,  ...,T.  r  i  roi 

The  following  conditions  have  been  imposed  in  [lj-L2J  to 
arrive  at  their  solution: 

(SC-1)  The  JV-element  array  is  uniform  and  linear 
with  spacing  d  <  A/4  and  moreover,  N  -  2  >  M. 

(SC-2)  The  range  parameters  of  the  sources  are  not 
equal,  i.e.,  for  fe  Z  in  (5). 

(SC-3)  The  source  signals  Sm{k)  are  zero-mean,  mu¬ 
tually  independent,  non- Gaussian,  narrowband, 
stationary  sources  with  non-zero  kurtosis.  The 
sensor  noise  Tii(ife)  is  independent  of  the  source  sig¬ 
nals  and  is  Gaussian. 

We  provide  a  solution  based  upon  the  following  conditions: 
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(11) 


(GT-1)  The  JV-element  array  is  uniform  and  linear 
with  spacing  d  <  A/2  and  moreover,  N  >  M. 
(GT-2)  The  source  signals  «t7i(^)  are  zero-mean,  mu- 
tuaUy  independent,  non-Gaussian,  narrowband, 
stationary  sources  with  non-zero  kurtosis  such  that 
{9m,  Tm)  ^  {9n,rn)  for  71  ^  TTi.  The  sensor  noise 
ni{k)  is  independent  of  the  source  signals  and  is 
Gaussian. 

Note  in  particular,  that  for  the  same  number  of  array 
elements  we  allow  for  larger  aperture  (hence  better  accu¬ 
racy):  compare  (SC-1)  with  (GT-1)  —  we  allow  d  =  A/2. 
As  a  matter  of  fact  we  consider  two  solutions: 

•  Iterative  Source  Separation  Using  Inverse  Fil¬ 
ter  Criteria.  Here  we  follow  [3]  and  [8]  to  estimate 
a  scaled  and  permuted  version  of  F  (see  also  [U])- 
Note  that  unlike  [3]  and  [8]  we  only  use  spatial  nlter- 
ing;  there  is  no  temporal  filtering  as  the  mixture  in 

(1) -(2)  is  instantaneous.  This  method  yields  columns 
of  F.  Next  each  extracted  column  of  F  is  used  to  esti¬ 
mate  Om  and  rm  pair  using  (3)-(5).  Here  we  first  use  a 
‘crude’  closed-form  solution  followed  by  nonlinear  op¬ 
timization  using  the  closed-form  solution  as  an  initial 
guess. 

•  Cumulant  matching.  Here  we  estimate  F  (or 
three  parameters,  range,  bearing  and  signal  cumulant, 
per  source)  via  cumulant  matching  using  the  iterative 
source  separation  solution  as  an  initial  guess.  Also  we 
use  all  cumulant s  combinations,  not  a  just  a  few  as  in 

[2] .  The  advantage  of  cumulant  matching  is  that  the 
resulting  solution  is  consistent  in  noise. 


where  __ 

hm  •—  O'smkm  and 

Consider  the  cost  function 


*Y45m 

O'Jm  ’ 


J«(C)  :=  |CUM4(e(fc))|/|£{|e(fe)|^}^  (12) 

Define  |74ma.l  =  inaxi<.n<M|74.ml-  We  get  from  (9) 

|CUM4(e(t))|  <  |74.naJ 

,  I  f4masBl 

m=l 


—  iTlmaasI 


M 


with  equality  if  and  only  if 


(13) 


hm  =  d5(m  —  mo),  mo  E  {l>  2, Jlf},  (14) 

where  d  is  some  complex  constant  and  mo  is  such  that 
|743mol  =  l74ma«l-  Note  th^  (14)  follows  from  the  fact 

that  |hml*  =  [E™=1  Ihmpr  iff  (14)  is  true  for  some 

mo.  Thus  we  have 

MC)  <  |747naJ,  (15) 

with  equality  iff  (14)  holds  (except  for  \y^rnax\  =  l743ml)- 
Thus,  when  (14)  nolds  true,  (7)  reduces  to 

e{k)  =  d3mo{k),  (16) 


3.  ESTIMATION  CRITERIA 

The  iterative  separation  step  is  based  on  estimation  using 
higher-order  statistics.  In  the  rest  of  this  section  the  noise 
n(h)  in  (1)  is  assumed  to  be  negligible.  Let  CUM4(iy) 
denote  the  fourth-order  cumulant  of  the  complex-valued 
zero-mean  random  variable  ty,  defined  as 

CUM4(iy)  :=  cum^{wyw*^w^w*} 

=  £{|u;|*}  -  (6) 

The  notation  74 sm  ==  CUM4(am(fc))  and  crjm  = 

rp 

J5{l5m(A:)|^}  is  used.  Consider  an  1  x  N  row- vector  C*^, 
with  its  2-th  entry  denoted  by  Ci,  operating  on  the  data 
vector  y{k).  Let  the  product  be  denoted  by  e(h).  Then  it 
follows  that 

N  M 

e{k)  =  ^  Ciyi{k)  =  ^  hm»m.{k),  (7) 

t=l  m=l 


i.e.,  the  processed  output  is  a  possibly  scaled  version  of  one 
of  the  sources.  This  can  be  utilized  to  separate  the  sources. 

3.1.  Does  such  a  solution  exist? 

It  follows  from  (11)  and  (14)  that 


^  r 

^  -  { 


i=l 


if  j  =  jo 

if  i  =  1,2, 


In  the  matrix  notation  we  have 

F^C  =  E  (17) 


where 


E  =  [0  0  •••  0  (i  0  0]^. 


Note  that  the  JW— column  vector  E  has  the  nonzero  entry 
in  its  jo-th  row  with  zeros  every  place  else.  A  set  of  suffi¬ 
cient  conditions  for  the  existence  of  the  desired  solution  is: 
N  >  M  and  F  has  full  column-rank.  These  conditions  are 
satisfied  if  (GT-1)  and  (GT-2)  hold  true. 


where 


hm  :=  and  H  :=  [hih2..AM]^  =  C*^F.  (8) 

i=l 

From  [10],  the  fourth-order  cumulant  can  be  expressed 


M  M 

CUM4(e(h))  =  ^  74.m|Am|*  =  ^4— (®) 

in=l  m=l 

M  M 

E{\e{k)\^}  =  53  =  53 

m=l  m=l 


3.2.  Stationary  Points 

Suppose  that  we  choose  to  optimize  (12)  via  a  gradient- 
based  optimization  method.  Then  we  will  converge  to  a 
stationary  point  of  (12).  The  set  of  stationary  points  of 
the  cost  J42(C)  =  G42(h)  w.r.t.  the  composite  response  hj 
is  the  set  of  solutions  to 

=  0.  (18) 

dhm 

From  (9), (10), (12)  and  (18)  we  have  either  hm  —  0  or 

74.m|fcmf  =  CUM4(e(h))  /  S{!e(*)|=*}  (19) 

where  the  right-side  of  (1^  is  evaluated  at  the  solution  to 
(18)  under  consideration.  Since  the  cost  G42(h)  (=*^42(0)) 
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is  invariant  to  any  scaling  of  h  or  of  C,  in  practice  this 
ambiguity  is  resolved  by  requiring  unit  norm  for  the  equal¬ 
izer  tap  gains  (i.e.  C);  see  Sec.  4.  This  therefore  rules  out 

the  solution  kj  =  0  Vj  to  (18).  Therefore,  we  wiU  confine 
our  attention  to  solutions  that  have  at  least  one  nonzero 
element  hj. 

Let  P  be  the  number  of  nonzero  components  of 
h  :=  [hi,  •  •  • , 

for  which  (18)  holds  true.  For  a  prespecified  P  (P  = 
1,  2,  •  •  •  M),  the  solutions  to  (18)  are  all  vectors 


such  that 

|T<^)|2  _  /  PpMu,  if  (20) 

I  “to  otherwise  ^  ' 

where  pp  =  C'UM4(e(fc))/£{le(A!)|^}  is  evaluated  at 
and  Ip  is  any  P-element  subset  of{l,2,--',M}. 

By  perturbation  analysis  (see  also  [3])  we  can  show  that 

ah  the  vectors  P  =  2, 3,  •  •  • ,  M,  are  unstable  equiHb- 

ria  (saddle  points)  of  G42(h).  Next  we  examine  the  second- 

order  derivatives  of  G42  at  which  implies  a  solution  of 
the  type  (14).  By  expanding  G42(h)  in  a  Taylor  series 
around  we  can  establish  that  the  class  of  solutions 

such  as  (14)  (for  any  choice  of  jo)  is  a  “2-D  plane”  of  lo¬ 
cally  stable  equilibria  (local  maxima):  any  perturbation  in 
(14)  except  for  the  one  in  d,  only  decreases  Gr2(h)  below 

Gr2(h^^^);  changes  in  d  alone  leave  it  unchanged. 

The  above  discussion  pertains  to  the  stationary  points 
of  J42  w.r.t.  the  composite  response  hj.  However,  in 
practice,  we  can  only  adjust  the  beamformer  weights.  So 
what  about  the  stationary  points  of  J42  w.r.t.  the  beam- 
former  coefficients?  This  aspect  is  discussed  next.  Using 
(8)  and  conditions  (GT-l)-(GT-2),  it  is  easy  to  show  that 
dJ42(C)/5cm  =  0  for  m  =  1,  2,  -  •  • ,  N  impHes  that 

=  0  for  i  =  (21) 

dhj 

Thus,  all  stationary  points  of  J42  w^-t.  the  beamformer 
coefficients  are  described  by  the  stationary  points  of  G42 

w.r.t.  the  composite  response  hj. 

Finally,  the  above  discussion  carries  over  to  J32(C). 


4.  COST  OPTIMIZATION 

An  iterative,  batch,  steepest  descent  method  is  used^  to 
optimize  (12).  The  cost  J42(C)  is  invariant  to  any  scaling 
of  C.  To  rectify  this,  at  every  iteration  the  equalizer  tap 

gain  vector  C  are  normalized  to  unit  norm.  Let  J42(C) 
denote  the  data-based  (12)  with  its  explicit  dependence 
upon  C.  Let  C  denote  the  initial  guess  which  sets  all  taps 
to  be  equEd  and  unit  norm.  For  calculation  purposes,  (6) 
is  substituted  in  (12)  to  get 


r  (22) 


The  steps  taken  to  optimize  the  cost  J42(C)  are 

(i)  Set  the  step  size  p  =  1. 

(ii)  Calculate  C'  =  C  +  and  the  resulting  cost 

J42(C'). 

(iii)  If  J42(C')  >  J42(C),  then  accept  C"  =  C'/HG'H  as 
the  new  equalizer  vector,  set  C  ■*—  C”,  and  then  go  to  (i). 
Else  set  p  =  p/2  and  go  to  (ii). 


5.  ITERATIVE  SOURCE  SEPARATION 

Maximization  of  (12)  w.r.t.  the  vector  C  leads  to  the  so¬ 
lution  (16)  .  Given  (16)  we  can  estimate  and  remove  the 
contribution  of  SmoW  ^^om  (1).  Then  we  have  a  linear 
mixture  with  N  measurements  but  M  —  1  sources  (instead 
of  M  sources  as  in  (l)-(2)).  Repeat  the  process,  i.e.,  mp- 
imize  (12)  w.r.t  a  new  set  of  weights  C  to  get  a  solution 
e(ife)  =  where  G  ({1,2, Af}  -  {mo}).  Sim¬ 

ple  cross- correlation  technique  may  be  used  for  extracted 
source  cancellation j  a  by-product  of  this  is  estimates  of  the 
gains  =  1,2,  ...,N),  with  the  extracted  amo(*)  as 

input  and  the  N  measurements  as  outputs.  This  leads  to 
the  following  procedure: 

1.  Maximize  (12)  w.r.t  C  to  obtain  (16). 

2.  Cross-correlate  {e(fc)}  (of  (16))  with  the  given  data 
(1)  and  define  the  estimates 


and  7mo  :=CUM4(e(A:)), 
(23) 


where  7mo  is  the  fourth-order  estimated  cumulant 
for  the  source  mo.  The  reconstructed  contribution 
of  e(fc)  to  the  data  =  1,2,  ...,M),  is 


yt,ino(^)  •—  (24) 


3.  Remove  the  above  contribution  from  the  data  to  define 
a  linear  mixture  with  N  outputs  and  M  -1  sources. 
These  are  given  by 

y'iW  ■=  V'W  ~  3«.”»o(*)-  (2®) 


4.  K  Af  >  1,  set  Af  ^  Af  -  1.  yi(k)  *-  »((*).  go  back 
to  step  1,  else  quit. 

Analyzing  the  above  algorithm  we  have 


jE?{3ft(^)6*(^)}  —  ^  /tm-E*{jJm(^)6  (^)}  —  fimod  CT^mo' 

m=l  ,  - 

(26) 

Using  (26)  in  (23)  we  have 


_  fimpd  Camg  _  ,  / j 

Jimo  \J\2rr2 

|a|  smo 


It  follows  from  (24)  and  (27)  that 
Now  use  (25)  and  (28)  to  deduce  that 


(27) 

(28) 


M 

yiW=  Z)  (29) 

m.=  l,m9fmo 


Using  (3)-(5),  we  get 

i  =  1, 2 . N,  (30) 


Jtmo 

where 


d  dr  2 

=  -27r-sin(^mo)  and  ^mo  = 

(31) 

It  follows  from  the  preceding  developments  that  under 
the  conditions  (GT-l)-(GT-2)  and  noise  Ti{k)  =  0,  the 
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proposed  iterative  approach  is  capable  of  blind  source  sep¬ 
aration  and  blind  identification  of  the  mixing  matrix  F  up 
to  a  scaling  and  a  permutation  matrix.  That  is,  given  F, 
we  end  up  with  a  G  where  the  two  are  related  via 

G  =  FAP  (32) 

where  A  is  an  Af  X  M  diagonal  scaling  matrix  (recall  d 
in  (14)  \  and  P  is  an  M  x  Af  permutation  matrix  (recall 
mo  in  (14),  we  don’t  “know”  which  source  it  refers  to). 
When  noise  n(A)  ^  0,  the  resultant  solution  will  be  biasea; 
how  much  is  problem  dependent.  This  bias  motivates  the 
cumulant-matching  solution  considered  in  Sec.  7.  Since  the 
noise  is  assumed  to  be  Gaussian,  cumulant  matching  will 
yield  consistent  estimates  within  the  class  (32^ 

6.  SOURCE  COORDINATE  EXTRACTION 
To  estimate  Wmo  and  given  F,  consider 

A  /(i+l)mo  ^  +(«-!)«„„]_  .  ^  2.  JV  _  1. 

Jimo 

(33) 

Next  define 


7.  CUMULANT  MATCHING 
7,1,  Indirect  Approach 

Cumulant  matching  strives  to  reduce  the  error  between 
the  data-based  cumulant s  and  tl^  model-based  cumulant s 
by  varying  the  parameters.  Let  Ci(zi, 22, 13,14)  denote  the 
data-based  consistent  estimate  of  ^4(11,2*2, 2*3, 24)  where 

(41) 

For  model-based  cumulant  calculations  we  fix  the  fourth- 
order  cumulant  7m  of  the  source  m  to  be 

7m  =  sgn(7m),  (42) 

where  7m  is  given  by  (23)  and  sgn(x)  is  the  sign  func¬ 
tion.  Any  non-identity  |7m|  is  “absorbed”  in  fim's.  The 
parameter-based  cumulant  is  then  given  by 

M 

^4(21, 22, 23, 2*4  |F)  =  ^  ^  7m/tim/t2m/t3m/t4m>  (43) 

171=1 


5.  A  »(f+l)mo  ^  ^  j . _  2 


(34) 


where  7m  is  fixed  to  the  value  given  by  (42)  and 
fiimt  fi^ntt  fi^m  and  fi^rn  are  initialized  using 


Solving  (34)  for  ^mo»  we  have 


fim  = 


(44) 


(35) 

which  yields  the  correct  solution  if  no  phase- wrapping  prob¬ 
lems  occur.  The  only  condition  necessary  for  no  phase 
wrapping  to  occur  is 


N-2 


Re 


iV-2 


|2^mo  I  ^  ^1 


(36) 


where  7m  and  fim  are  given  by  (23).  Cumulant  matching 
is  then  used  to  solve 


N  N  N  N 

E  E  E  l^*(*l>*2>**>*<)-<^«(»l.*2.*3,t4|F)|*, 
♦1=1  13  =  1  t3  =  l  14  =  1 


with  the  solution  being  consistent  within  the  class  .r^ 
b,  (32). 


which  follows  from  (34).  From  (36)  and  (31),  we  get 


^mo 


(37) 


since  <  1.  Since  d  <  A/2,  we  get  rm„  >  A/2  as 

the  condition  for  no  phase  wrapping  to  occur.  Using  l33l 
and  (35),  define  ^ 


Xi  =  XiB  ^  , 


2=  1.  (38) 


We  can  then  solve  for  2mo  as 


Wmo 


1 

N~1 


Re 


(39) 


The  source  coordinates  of  the  source  mo  can  then  be  esti¬ 
mated  as 


(40) 

To  further  refine  the  estimates  of  the  source  coordinates, 
a  two-dimensional  grid  search  is  used  in  the  vicinity.  This 
was  the  method  used  in  the  simulations  to  get  the  initial 
guess  for  cumulant  matching  (direct  approach). 


7.2,  Direct  Approach 

This  is  as  in  Sec.  7.1  except  that  now  each  source  is 
parametrized  by  three  parameters:  7m,  and  Tm-  There¬ 
fore,  the  model-based  cumulants  are  now  parametrized  by 
these  three  parameters  using  (3)-(5)  and  (43^.  We  now 
have  a  much  more  parsimonious  parametrization  particu¬ 
larly  when  W  >•  M. 

8.  SIMULATION  EXAMPLES 

8.1.  Example  1. 

We  consider  an  example  from  [2].  There  are  two  (Af  =  2) 
near-field  non-Gaussian  (BPSK)  sources  of  equal  power  at 
locations  {Oi  =  20®, n  =  1.8A)  and  (^2  =  “5®,r2  =  5A). 
The  number  of  sensors  in  the  uniform  linear  array  is  11 
(AT  =  Ip.  We  considered  two  inter-element  spacings:  d  = 
A/4  as  in^  [2],  and  d  =  A/2  unlike  [2].  The  measurement 
noise  is  circular- white  Gaussian  and  the  number  of  data 
samples  is  T  =  1000.  Table  I  shows  the  results  based  upon 
100  Monte  Carlo  runs  for  SNR  =  lOdB;  the  results  quoted 
from  [2]  are  for  SNR  =  15dB.  It  is  seen  that  the  proposed 
approaches  outperform  [2]  (inspite  of  having  lower  SNR). 

8.2.  Example  2. 

We  modify  the  example  from  [2]  to  include  the  case  of 
mixed  sources.  The  source  at  location  {$2  =  -5®,r2  =  5 A) 
is  now  taken  to  be  an  exponential  source;  the  rest  is  as 
in  Example  1.  Table  II  shows  the  results  based  upon  100 
simulations  for  SNR  =  lOdB. 
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8.3.  Example  3. 

We  consider  the  example  from  [9]  and  modify  it  to  generate 
correlated  noise.  There  are  now  two  BPSK  sources  of  equal 
power  at  locations  {B\  =  5®,ri  =  20A)  and  (^2  =  17  ,7^2  = 
15A).  Gaussian  sources  could  not  be  used  as  in  [9]  since 
the  cumulant  of  a  Gaussian  random  variable  is  zero.  To 
generate  correlated  noise,  a  Gaussian  source  at  location 
(^n  =  =  25 A)  was  included.  The  number  of  sensors 

in  the  uniform  linear  array  is  5  (W  =  5)  and  the  inter¬ 
element  spacing  is  taken  to  be  d  =  A/2.  The  measurement 
noise  is  circular- white  Gaussian  and  the  number  of  data 
samples  is  T  =  4000.  Table  III  shows  the  results  based 
upon  100  simulations  for  SNR  =  3dB. 

Remark.  For  Examples  1  and  2,  the  SNR  is  computed 
as  SNR  =  where  S  is  the  signal  power  given  by 


S  = 


M  N 

Tn=l  1=1 


(46) 


and  o-n  is  the  noise  variance  given  by  = 

(2JV)”^  Sill  Notice  that  (46)  yields  signal 

power  summed  over  all  sources;  it  is  not  average  signal 
power  per  source.  For  Example  3,  the  SNR  is  computed  as 


SNR  = 


2(tTa+:^Er=i 


(47) 


where  gi  is  the  gain  at  the  i-th  sensor  for  the  Gaussian 
source. 
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The  approaches  used  in  tables  below  are: 

(i) :  Proposed  (Inverse  Filter:  Secs.  5  &  6) 

(ii) :  Proposed  (Cumulant  Matching:  Sec.  7.2) 

liii):  [2]  (Subspace) 

(iv):  [2]  (Cumulant  Matching) 


TABLE  I:  Results  for  Example  1-1^  =  11,117  =  2 

2  BFSK  sources 

T  =  1000,  SNR  =  lOdB 

Parameters 

ri 

7*2 

True 

Values 

-^0^’ 

1.8A 

-5" 

5A 

1 - a-^T74  n 

Approach 

(i) 

mean 

a 

20.555' 

±0.156° 

1.765A 

±0.010A 

-5.022“ 

±0.342° 

5.008A 

±0.091A 

Approach 
(“)  - 

mean 

<T 

20.010“ 

±0.452° 

I.8OOA 

±0.027A 

p5.001“ 

±0.417° 

5.001A 

±0.109A 

Approach 

(iii) 

mean 

<T 

19.95“ 

±0.67° 

1.78A 

±0.05A 

-4.94" 

±0.64° 

5.22A 

±0.47A 

Approach 

_ (iv) _ 

mean 

a 

19.95*^ 

±0.32° 

1.80A 

±0.12A 

-4.94“ 

±0.30° 

5.22A 

±1.10A 

1  <i  =  A/2  n 

Approach 

(i) 

mean 

a 

20.007' 

±0.081° 

1.800A 

±0.004A 

-4.994“ 

±0.080° 

4.999A 

±0.019A 

Approach 

(ii)  . 

mean 

a 

19.999“ 

±0.135° 

1.800A 

±0.008A 

-4.993''' 

±0.129° 

4.999A 

±0.032A 

TABLE  II:  Results  for  Example  2-1^  =  11,117  =  2 

One  BFSK  and  one  exponential  source 

T  =  1000,  SNR  =  lOdB,  d  =  A/4 

Parameters 

Tl 

~~S-2 

T2 

True 

Values 

1.8A 

-5'" 

5A 

Approach 

(i) 

mean 

(T 

20.344“ 

±0.792° 

1.779A 

±0.049A 

-5.136' 

±0.858° 

5.052A 

±0.256A 

Approach 

(ii) 

mean 

(T 

19.874°^ 

±1.131° 

1.810A 

±0.079A 

-5.072“ 

±1.132° 

5.037A 

±0.318A 

TABLE  III:  Results  for  Example  3-lV’  =  5,117  =  2 

2  BFSK  sources 

T  =  4000,  SNR  =  3dB,  d  =  A/2 

Parameters 

“91 — 

n 

-^2 

T2 

- 

Values 

5" 

20A 

17" 

15A 

Approach 

(i) 

mean 

a 

4.947“ 

±0.314° 

20.507A 

±1.947A 

16.862“ 

±0.159° 

15.30l)r 

±0.647A 

Approach 

(ii)  . 

mean 

(T 

4.995' 

±0.404° 

20.334A 

±2.611A 

16.957" 

±0.382° 

15.337A 

±1.471A 
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Abstract 

In  this  paper  we  provide  explicit  formulas  for  the 
covariances  of  second-,  third-,  and  fourth-order  sam¬ 
ple  cumulants  as  used  in  narrowband  array  processing. 
These  covariances  provide  a  basis  for  analysing  the 
performance  of  cumulant  based  algorithms  for  finite- 
sample  length,  which  is  in  contrast  to  usual  asymp¬ 
totic  performance  analyses.  The  use  of  these  formu¬ 
las,  which  consist  of  several  thousand  terms,  will  be 
demonstrated,  and  a  rough  idea  of  their  applicability  to 
a  performance  analysis  for  finite  numbers  of  samples 
will  be  given. 


1  Introduction 

Since  existing  second-order  statistics  based  meth¬ 
ods  cannot  solve  numerous  problems  in  narrowband 
array  signal  processing,  many  higher-order  cumulant 
based  algorithms  have  been  recently  proposed.  For 
example,  the  virtual- ESPRIT- Algorithm  (VESPA)  for 
direction-finding  and  recovery  of  independent  sources 
can  also  calibrate  an  array  of  unknown  configuration 
(see  Dogan  and  Mendel  [5]).  Also  VESPA  can  be  ex¬ 
tended  to  direction-finding  of  highly  correlated  or  co¬ 
herent  sources  (see  Gonen  et  al.  [8]),  which  is  often 
the  case  in  practice  due  to  multipath  propagation  or 
’’smart”  jamming.  Furthermore,  since  higher-order  cu¬ 
mulants  are  theoretically  insensitive  to  additive  white 
or  colored  Gaussian  noise,  they  always  suppress  such 
noise.  These  new  higher-order  cumulant  based  algo¬ 
rithms  require  a  performance  analysis  not  only  among 
themselves,  but  also  with  second-order  statistics  based 
algorithms. 

In  the  past  decade,  array  signal  processing  methods 
based  on  eigen-decomposition  of  the  covariance  matrix 
have  received  considerable  attention,  since  they  pro- 
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vide  high  resolution  with  low  computational  complex¬ 
ity.  Xu  and  Buckley  [12]  presented  a  bias  analysis  of  the 
MUSIC  location  estimator  using  a  second-order  Taylor 
expansion  of  the  derivative  of  the  null  spectrum.  They 
show  in  their  simulations  that  the  bias  expression  can 
be  accurately  applied  even  for  a  very  limited  number  N 
of  independent  snapshots  {N  =  20).  Reed  et  al.  [11]  de¬ 
rived  an  exact  output  signal-to-interference-plus-noise 
ratio  (SINK)  formula  as  a  function  of  N  for  an  opti¬ 
mum  beamformer.  Feldman  and  Griffiths  [7]  also  in¬ 
vestigated  the  SINR  formula  of  Reed  and  showed  by 
simulation,  that  this  formula  can  be  closely  approxi¬ 
mated  by  a  simple  expression  if  N  is  larger  than  ap¬ 
proximately  50-  These  results  show  that,  perhaps,  a 
small  number  of  samples  is  sufficient  for  practical  use 
of  many  asymptotic  performance  analyses. 

Some  performance  analyses  for  higher-order  cumu¬ 
lant  based  methods  have  already  been  done.  Car¬ 
doso  and  Moulines  [2]  have  derived  closed-form  ex¬ 
pressions  of  the  asymptotic  covariance  of  MUSIC- 
like  direction-of-arrival  (DO A)  estimates  based  on  two 
different  fourth-order  cumulant  matrices.  Yuen  and 
Friedlander  [14]  presented  an  asymptotic  performance 
analysis  of  ESPRIT  [13],  higher-order  ESPRIT  [4],  and 
VESPA. 

All  these  performance  analyses  are  based  on  the 
asymptotic  covariances  of  second-,  or  fourth-order  sa.m- 
ple  cumulants.  To  provide  a  basis  for  investigation 
of  the  finite-sample  case,  we  will  derive  covariances 
of  the  second-,  third-  and  fourth-order  sample  cumu¬ 
lants,  the  finite-sample  covariances.  Given  these  re¬ 
sults,  we  can  address  the  question  of  whether  higher- 
order  cumulant-based  algorithms  need  more  samples 
than  second-order  cumulant  based  algorithms,  and  we 
can  also  study  adaptive  algorithms. 


2  Problem  Formulation 

Let  P  narrowband  plane  waves  centered  at  a  known 
frequency  u>o  impinge  on  an  arbitrary  array  composed 
of  M  sensors.  The  received  signal  vector  r(t)  = 
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(ri(^),...,rA/(i))^  can  be  modeled  as 

r{t)  =  A{<f),'d)s{t)  +  n{t),  (1) 

where  s{t)  =  {si{t),  is  a  P  x  1  vector  which 

contains  the  P  zero-mean  non-Gaussian  source  signal 
complex  envelopes  at  time  t,  and  is  an  M  x  P 

steering  matrix.  0  6  [— 7r,7r]  is  the  azimuth  angle, 
€  [0,  tt]  is  the  elevation  angle  and  n{t)  is  an  M  x  1 
vector  composed  of  M  zero-mean  noise  signals.  Any 
noise  signal  nm{t),  m  =  1(1)M  is^  independent  of  any 
source  signal  Sp{t),  p  =  1(1)P.  Furthermore,  all  sig¬ 
nals  Sp{t),  nm{t)  are  modeled  as  sequences  of  indepen¬ 
dent  and  identically,  but  arbitrarily  distributed,  com¬ 
plex  random  variables  with  finite  moments  up  to  the 
eighth-order.  This  model  is  commonly  used  in  the  far- 
field  case,  but  it  can  be  extended  to  coherent  signals 
[8]  and  to  the  near-field  case  [3]. 

In  this  paper,  we  focus  our  attention  on  second-, 
third-,  and  fourth-order  cumulants,  which  are  respec¬ 
tively  defined  as 

Ck,i  =  E{r*(«)r;(i)}  (2) 

Ck,i,m  =  E{rkit)ri{t)rm{t)}  (3) 

Ck,i,m,n  =  ^  {rk{t)ri  {t)r’^{t)rnit)} 

-E{rk{t)rtit}E{at)rn{t)} 
-E{rk{t)r*^{t}E{rt{t)rn(t)} 
-E{rkit)rn{t}E{rtity^it)},  (4) 

since  E  {rm(i)}  =  0  V  m  =  1(1)M.  The  second-order, 
Ck,i,  and  third-order  sample  cumulants,  Ck,t,m.,  are  ob¬ 
tained  by  replacing  the  expected  values,  E  {},  by  time 

averages  ^  X)<=i .  where  iV  €  K  is  the  number  of  sam¬ 
ples  .  In  contrast,  the  fourth-order  sample  cumulant 
Ck,i,m,n  is  estimated  more  generally  by 

1  ^ 

CM.m.n  = 

.  N  N 

“  IT  H  rl,{p)r„{jp) 

^  t=l  p=l 

^  N  N 

-gY^^kit)r^{t)'^rHp)rn(p) 

^  t=l  P=1 

J  /V  N 

-g'Zrkit)r„it)'£rt(p)r*M,  (5) 

^  t=l  P=1 

where  a  and  /?  are  functions  of  AT.  If  the  fourth-order 
sample  cumulant  is  an  unbiased  estimate, 

“"7^’  ^  =  iV2-Ar,  N>\,  (6) 

im  =  1(1)M  is  a  more  compact  form  of  m  =  1, 2, M,  The 
number  in  parentheses  means  the  increment  of  the  sequence. 


otherwise,  the  commonly  used  eqs. 

a  =  N,  13=:  N\  N>0,  (7) 

lead  to  a  biased  estimator.  Note  that  the  second-  and 
the  third-order  sample  cumulants  are  always  unbiased. 

3  The  Finite-Sample  Covariances 

To  demonstrate  the  tedious  calculations  of  the  finite- 
sample  covariances,  first  the  derivation  of  the  second- 
order  finite-sample  covariance,  and  then  the  basic  equa¬ 
tion  of  the  fourth-order  finite-sample  covariance  will 
be  given.  All  these  formulas  are  published  in  differ¬ 
ent  computer  languages  on  the  World- Wide- Web  un¬ 
der  http : //f b9nt-ln.uni-duisburg. de/mitarbeite 
r/kaiser/hoswshop .  97/hoswshop97  .html  and  also  in 
[9]. 

The  second-order  finite-sample  covariance  can  be 
written  as 

COV  (cA5i,ii,C^2,/2)  E  {^k2,l2} 

=  (t) } 

-EK(ih;(f)}EK^(iK(f)}).  (8) 
and  is  therefore  equal  to  the  asymptotic  covariance 
Cov°°  {ck,,i„cl^^i^)  =  ^1^  ATCov  {ckrM>Ck,,iJ  (9) 

divided  by  N.  Equivalently,  the  third-order  finite- 
sample  covariance  is  given  by 

Cov(cfej^/j^77ij,C^2,/2,m2)  “ 

(E  {t)ri^  (0}  ~ 

E  {rkAtK{t)r^,  (i)}  E  WC.  W}) , 

and  is  also  equal  to  the  asymptotic  covariance 
Cov°°  divided  by  N.  This  means 

that  any  asymptotic  performance  analyses  based  on 
second-,  and  third-order  cumulants  are  also  valid  for 
the  finite-sample  case.  Unfortunately,  as  we  will  see 
later,  this  property  does  not  hold  for  fourth-order  sam¬ 
ple  cumulants. 

For  further  evaluation  of  (8)  the  fourth-  and  the 
second-order  moments  of  rk{t)  must  be  calculated  us¬ 
ing  (1).  We  will  omit  the  variables  t,  0,  in  the  fol¬ 
lowing,  to  simplify  the  notation.  After  some  algebra, 
we  obtain 

^  {rk,rlrl2rh}  = 

Eja^i  ss^a,;  a,^.s}  -fE{a^^  ss"a;J  E{nl^  n,  J+ 

E{ali  E{nl  n,  j4E{a|’j  sa,^  s}  E{nl  J+ 

E{s"a,*  s^a^ J  E{nfc,  n; j4E{s"a,*  a,^s}  Ejnfci  J 
+E{s"a;j^  al  s}  E{n*.  <}4E{n*,  nl^  m,}  (10) 
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where  ajtj  =  (ai,fci ,  •••? fci-th  column  vec¬ 
tor  of  the  steering  matrix  A  and  the  elements  of  a^j 
can  be  written  as 

gj^(aVn  sini9fcj  cos4>ki  +2/bv  sm<^fcj  +2Vn  cosi^fc^) 


in  which  gm{4^ki  ?  ^fei )  is  the  response  of  the  m-th  sensor 
on  a  source  signal  impinging  on  the  array  with  angles 
<t>k,,  are  the  coordinates  of  the  m- 

th  sensor,  A  =  27rc/wo  is  the  source  wavelength,  and 
c  is  the  propagation  speed  of  the  plane  wave.  To  fur¬ 
ther  calculate  (10),  the  random  variables  (s)  must  be 
separated  from  the  deterministic  variables  (afc).  Since 
we  will  also  be  faced  with  this  problem  for  third-,  and 
fourth-order  finite-sample  covariances,  we  look  for  a 
general  notation.  Between  the  scalars  s  a]J,  a*;®,  we 
can  simply  introduce  a  Kronecker  product  {®)  (in¬ 
stead  of  transposing).  Using  the  rule  (AB)  ®  (CD)  = 
(A0C)  (B0D)  we  obtain,  after  some  algebra. 


Cov(cfc,,j,,Cfc2,j2)  =  jif 

(E  {s0s*0s*0s}  -  E  {s0s*}  0E  {s*(8)s})  + 
(aj’,0a^)  E  {s0s*}  E  {n,*  }  + 

«<S»a,^)E{s0s}EKniE,}  + 

E  {s*0s*}  E  + 

(a,"0aDE{s*0s}E{n*,n;jJ-^ 

E  {nfcjn,*  -  E  {nfc,n*j }  E  {nfc2ni2})  •  (12) 


Given  the  statistics  of  the  source-  and  noise  signals  up 
to  order  4,  the  second-order  finite-sample  covariance 

can  be  calculated.  i  r 

Due  to  lack  of  space,  we  omit  the  formula  for  the 
third-order  finite-sample  covariance.  The  fourth-order 
finite-sample  covariance  can  be  written,  after  tedious 
calculations,  as 


Cov  {cki,ii  ,mi  ,ni  j  ) 

E{n:i  n*  Cl  »n J  E{C  Us  r-mz  tQ  )+ 

(3a  -  — 11  (E{7>,n*Ci»ni*m2C2}E{Cfl2}+ 

E{rk,nlr^irnin2r^^rlirrn^-\^rk^^ 

^n2  ^ni  rmz  n2}E{^i  ^^712  TJ J  rmi}+ 


E{ni  n>mi  r„.  n*  r4}E{r„i2n2}-2Efe 

E{T)ti  Til  Cl  fni  11^ nna} fJ^-^E^l  ^i  rmi^»n J E{^ n^+ 

E{ni  TJ,  CnzUnuCJElC^niJ-iE^i  n3E{rmi  '’"i 

E{rm2  C}E{C»ifei  '■mi  nil  %nj-®^ni  n^i  r„i}Eh2nJ%.2  '’'4)+ 

iN^-ZN+2)N  (E{^^^„^^^^^E{CCi}E{CnJ  + 


0^ 


E{ni  Jni  CnJEKin^EiCCild- 


'mj 


rm2  ^2}+ 

fci  » niJ  i-r  fc2  *  ^2}  ^ L'  1 1  •  mi  •  njj  ’  *2J  ’  “  ^  't  r 

Efe  Cl  rma  CJE^^nJ^ni  '’”4^ 

E{Cr„i}E{CC}E{ntiCi  nnij^^^ni  nM»feW+ 

E{r„i  »>i  C  »T^  Cil^n^  n JdEKi  ’'mi  Vt, 

EKi  n^i  SE{nn2nJE{n;r„j4E{n,.,  T^JEH'-m^nJEHni 

E{r„i  Ubi  C  W  E{C  C,}B^.g  n2}4E{nii  n,  J  E^,  C  nn J  ^r„2  nj+ 

E{Ci  nJE{C'-ni  ni^W^'i.nJgnni  r„i}+ 

E{C  nti  C  j  '■m^  E{lmi  rn^-^Pi  ^kl  Una  ^C  C)  ^'’mi  ntl}+ 

E{rt  nfci}E{r4  W + 

Ehl»>JE{C  ^l^EiCi  »’ni  nannJ-iElC  n:JEj;%rnJ  n>2nj+ 

E^JubJElC  nJE{Ci  r„i  4  w+^n..  »>1 

E{r„,  n  J  E{^  Cl  C  ^2}%^  nn5-tE{r,ni  rki  ^  ''"i| 

E{Ci  nbJE{Crni  rfonJE{Cr,4^E{nl  ^i  n2n2}E{imi  rnMv^-^ 

E{Ui  hj]}  E{r,ni  ^ni  ^1:2  ^2^  ^^”2  ^”12} 

(E{r>i  Tni  rr„2  ^^2}^^!  Hi}  + 

E{r)fcj  rni  Us  E{ni  Ci  nriJ-lEfr^i  rm  ^ 

E{Tifci  rni  E{nl  Ha  rni  ^^2  nJ+ 

E{7)ti  rni  r^n^Hni  T'mi  nn2  nni  r„2  rm^^Tn^  Ul  n2  + 

E{?Tbi  Cl  nt  Ug}  Hi  rm2  nni  ^2  nna}  E{rni  Ui^ns 

E{?>i  Cl  c  '^Q  ^ 

E{7>i  Cl  C  ^  r,i2  ^2 

E{n:i  C  C  ^ j  nn2  ^ 

E{?>1  Ui  C  ^1  h:2  ’"”^2}  E{nii  r^i  Hs  ni J+ 

^^^2  E{T*ni  r,7ii  rm2  ^^2}" )~^ 

2N{-2N^+5N -J)  (E(„,^r„jE{Cr;jE^:Ci}E{n2niJ+ 

/?2 


-lE{nfei  r„i}^C  nj^i  ''mi}E{rm2  rnj-t^ni  W  Eha  Efhi  nij 

E{rn,2  C2}d"E{n!i  E{r|52  Ui}  ^fyni  ^mi}  E{7m2  '"na) )  •  (  ^ 

Inserting  (1)  for  the  elements  of  r(f)  and  using  the  general 
Kronecker  product  notation,  even  for  P  =  1,  yields,  2504 
terms  involving  moments  of  si(t),  n.m(t). 

If  a  =  iV  and  P  =  N^,  (13)  can  be  written  as 

^  I-  -  \  j.h.4.  h.  fiai 

Cov  ^Cfci,ii,mi,ni  5  C/.2,f2im2,n2  7  ”*  ji^2  *'  jyS  '  ^ 


whereas,  if  a,  P  are  given  by  (6),  then 


.  ^  ^  -L  , 

Cov  ^CA;i,Zi,Tni,ni  5  C;i;2,i2,m2,n2/  jV  iV  —  1 


N{N  -  1) 
(15) 


where  5*,  i  =  1(1)6  are  constants.  This  is  in  con¬ 
trast  to  the  second-  and  third-order  finite-sample  cumu- 
lants  whose  covariances  are  proportional  to  1/N.  Conse¬ 
quently,  an  asymptotic  performance  analysis  cannot  sim¬ 
ply  be  used  to  investigate  the  finite-sample  case  for  algo¬ 
rithms  based  on  fourth-order  cumulants.  In  this  context 
the  question  arises  as  to  whether  a,  P  can  be  chosen  so 
that  Cov  (cfc,,(i.mi,ni,cJ2,i2,m2,n2)  *®  proportioual  to  1/JV 
?  The  answer  is  no,  since  it  can  be  easily  shown  that  if  the 
first  two  scale  factors  of  (13)  which  are  are  proportional  to 
1/N,  the  remaining  scale  factors  are  not.  Interestingly,  this 
leads  to  (6)  for  a  and  p. 
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4  Simulations  and  Analytic  Evalua¬ 
tions 


The  purposes  of  this  Section  axe  twofold,  namely:  1) 
to  ’’verify”  the  formulas,  especially  the  fourth-order  finite- 
sample  covariance,  by  Monte- Carlo  simulation  and  analytic 
evaluation;  and,  2)  to  give  a  rough  idea  of  applicability 
of  an  asymptotic  performance  analysis  for  finite  N,  For 
the  second  purpose  we  consider  the  relative  error  between 
the  finite’ sample  and  the  asymptotic  variance  of  the  fourth- 
order  sample  cumulant 


iVVar  (CA;,/,m,n)  ~  Vsj* 

iVVar  (cA;,/,rn,n) 


(16) 


where  Var"^  is  defined  equivalently  to  (9).  In  the  case  of 
the  biased  estimator  (7),  ek,i,m,n  can  be  written  as 


Nb2  +  h  b2 

N%+Nb2+bs  ~ 


and,  for  the  unbiased  estimator  (6), 


(17) 


_  65  +  66  65  +  be  ox 

(Ar-l)64  +  iNr65  +66  (64  +65)Ar‘  ^  ^ 


4.1  Example  1:  Comparison  of  Monte- Carlo  Sim¬ 
ulations  and  Analytic  Evaluation 


To  demonstrate  the  correctness  of  our  formulas,  we  avoid 
symmetries  in  the  parameters  (e.  g.,  we  choose  a  non-linear 
array)  and  important  parameters  are  not  zero  (e.  g.,  we 
choose  skewed  distributed  noise  with  odd  moments  not  to 
zero).  We  consider  a  non-linear  array  [5]  of  M  —  5  radar 
antennas  with  response  pm (0,^9)  =  l/sint?  [1].  Two  64- 
QAM  signals  consist  of  AT  =  20  samples  impinging  on  this 
array  from  directions  <f>i  =  0°,  <^i  =  20°,  =  i?2  =  40° 

with  E  {|si(f)p}  =  1  for  i  =  1,2.  The  signal-to-noise  ra¬ 
tio  varies  from  SNR  =  -60dB(10dB)60dB  and  the  addi¬ 
tive  noise  is  logarithmic  Gaussian  distributed  with  skewness 


Figur^.  V  =  logioVar  (64,3,24)  (solid)  and  V  = 
logic  Var  (64,3,24)  (stars)  for  iV  =  20  samples. 


Fig.  1  shows  a  very  good  agreement  between  the 
analytically  calculated  variance  V  and  the  estimated 
variance  V  (by  Monte-Carlo  simulations).  The  clear 
increase  of  the  variance  for  low  SNR  comes  from  the 
large  higher-order  noise  moments.  Equivalently,  the 
slightly  decrease  for  high  SNR  comes  from  the  small 
higher-order  signal  moments.  Figure  2  shows  V  and 

V  for  SNR  fixed  to  OdB  and  the  second  azimuth  angle 
<j>2  =  ■-20®(1®)20®.  All  other  parameters  remain  un¬ 
changed.  It  can  be  seen  that  the  agreement  between 

V  and  V  is  not  as  good  as  in  Fig.  1,  especially  for 
(j)2  ^  (l>i  =  0  .  This  means  that,  for  different  parame¬ 
ters  (SNR,  (f)),  a  different  number  of  Monte-Carlo  sim¬ 
ulations  is  necessary  to  get  reliable  results  about  the 
(co)variance  of  finite-sample  cumulants.  This  can  be 
avoided  by  use  of  our  analytic  formulas.  The  exam¬ 
ple’s  results  demonstrate  the  validity  of  our  formulas; 
however,  numerous  other  experiments,  which  are  not 
shown  here,  also  confirm  this. 


Figure  2,  V  =  log^oVar  (64,3,24)  (solid)  and  V  = 
logic  Var  (64,3,24)  (stars)  for  =  20  samples. 


4.2  Example  2:  Dependence  of  ekimn  on  SNR, 


In  this  example,  we  assume  a  linear  array  of  10  omni¬ 
directional  -  IVk  =  1(1)P,  m  =  l(l)M) 

sensors  with  spacing  A/2  .  The  azimuth  and  elevation 
angles  are  (/»  =  0°,  1?  =  0®  and  (fc,/,m,n)  is  fixed  to 
(1, 1, 1, 1)  .  sit)  is  a  BPSK-signal ,  SNR  =  -20(10)20, 
and  P  =  1.  For  AT  >  1  the  behaviour  of  both  relative 
errors  in  the  Fig.  3  plots  is  linear  as  expected  from 
(17),  (18).  On  the  other  hand,  the  curves  are  non¬ 
linear  for  small  values  of  N  <  50,  which  is  caused  by 
63  or  63.  Somewhat  surprisingly,  the  relative  error  for 
SNR  =  — 20dB  exhibits  a  sharp  break  near  AT  =  8  in 
the  case  of  the  biased  estimator,  because  Nb2  +  63  is 
near  zero  for  N  =  8.  This  means  that  the  finite-sample 
and  asymptotic  variance  are  approximately  equal  for 
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N  =  8.  Therefore,  it  can  happen  that  an  asymptotic 
performance  analysis  could  be  in  nearly  perfect  agree¬ 
ment  to  a  Monte-Carlo  simulation  even  for  very  small 
N.  In  addition,  the  biased  estimator  leads  to  a  smaller 
relative  error  for  low  SNR,  but,  for  higher  SNR,  the  rel¬ 
ative  error  of  the  unbiased  estimator  is  smaller.  Other 
experiments  have  shown  that  not  only  the  arrival  an¬ 
gles  t?,  but  also  the  number  of  sensors  M  have  no 
impact  on  the  upper  curves.  This  also  remains  true  for 
two  equal-powered  sources. 

Observe  from  Fig.  3,  that  only  N  «  300  sample  are 
necessary  for  a  relative  error  of  0.01;  hence,  asymp¬ 
totic  performance  analysis  can  often  be  applied  with 
sufficient  accuracy  in  narrowband  array  processing  for 
a  small  number  of  samples  and  for  low  SNR. 


5  Conclusions 

In  this  paper  we  have  provided  explicit  formulas 
for  the  covariances  of  second-,  third-,  and  fourth- 
order  sample  cumulants  for  narrowband  array  process¬ 
ing.  These  equations  can  be  used  not  only  for  finite- 
sample  performance  analyses  of  cumulant-based  algo¬ 
rithms  but  also  for  optimization  of  some  parameters  to 
obtain  highest  estimation  accuracy,  e.  g.,  the  kind  of 
source  signals.  We  demonstrated  the  correct  working 
of  these  formulas  by  examples  and  showed  that  asymp¬ 
totic  performance  analysis  can  often  be  applied  with 
sufficient  accuracy  in  narrowband  array  processing  for 
small  number  of  samples  and  for  low  SNR. 
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Abstract 

In  the  present  paper  we  propose  a  new  approach  for 
the  estimation  of  DOA  for  polarized  EM  waves  using 
blind  separation  of  sources.  In  this  approach  we  use 
a  vector-sensorj  a  sensor  whose  output  is  a  complete 
set  of  the  EM  field  components  of  the  irradiating  wave 
and  we  reconstruct  the  waveforms  of  all  the  original 
signals;  that  is,  all  the  EM  components  of  the  source  ^s 
field.  The  blind  separation  of  sources  is  made  itera¬ 
tively  using  a  recurrent  Hop  field-like  single  layer  neural 
network.  The  simulation  results  for  two  sources  have 
been  investigated.  We  have  considered  coherent  and  in¬ 
coherent  sources,  and  also  the  case  of  varying  DOA  ^s 
vis-a-vis  to  the  sensor  and  a  varying  polarization. 


1  Introduction 

The  problem  of  estimating  the  DOA  has  so  far  been 
tackled  using  different  approaches.  Most  of  them  em¬ 
ployed  sensor  arrays  to  carry  out  space-time  process¬ 
ing  by  means  of  spatial  filtering  or  beamforming.  The 
performance  of  this  approach  is,  however,  directly  de¬ 
pendent  upon  the  physical  size  of  the  array,  regardless 
of  the  size  and  quality  (in  the  sense  of  signal-to-noise 
ratio)  of  the  available  data  [1,2].  The  limitation  of  this 
approach  due  to  its  poor  resolution  (ability  to  distin¬ 
guish  between  two  sources)  and  the  need  of  extension 
to  more  than  one  source  required  in  many  applications 
have  motivated  the  research  in  statistical  signal  pro¬ 
cessing.  This  finally  resulted  in  the  emergence  of  the 
signal  parameter  estimation  approach  as  an  active  area. 
Important  results  include,  to  cite  only  a  few,  AR/MA, 
maximum  entropy  (ME)[3],  and  Multiple  Signal  Clas¬ 
sification  (MUSIC)  [4],  [5]. 

This  paper  considers  multiple  electromagnetic 
source  localization  using  sensors  whose  outputs  cor¬ 
respond  to  the  complete  electric  and  magnetic  fields 
at  the  sensor.  We  assume  that  the  wave  is  travel¬ 


ing  in  a  nonconductive,  homogeneous,  and  isotropic 
medium.  Additionally  we  assume  a  plane  wave,  which 
is  equivalent  to  a  farfield  assumption  (  or  a  maximum 
wavelength  that  is  much  smaller  than  the  source-to- 
sensor  distance),  a  point-source  assumption  (i.e.,  the 
source  size  is  much  smaller  than  the  source- to-sensor 
distance),  and  a  point  like  sensor  (i.e.,  the  dimension 
of  the  sensors  is  small  compared  with  the  minimum 
wavelength).  Such  a  wave  can  be  expressed  as 

s{t)  =  aiexpj{wit).  (1) 

Thus,  the  vector  of  signals  incident  upon  a  linear  equi- 
spaced  array  of  n  sensors  is  described  by 

m 

y[t)  =  As  -f  noise  E  aiexpj{wit)di  +  T){t),  (2) 

«=:1 

where  m  is  the  number  of  sources  irradiating  the  array, 
fts,  and  Wi  =  2xfi  are  the  amplitude  and  frequency 
of  the  ith  source  respectively.  The  n-element  vector 
di  is  the  direction  vector  (called  also  steering  vector) 
corresponding  to  each  source 

di  =  . 1  <  *<  n 

(3) 

2t:D  .  . 

Ti  =  -—s\n9i.  (4) 

The  vector-sensor  array  processing  weis  addressed  in 
[6]  where  a  direct  solution  was  given  using  the  Cramer- 
Rao  bound.  Using  these  vector  sensors  should  outper¬ 
form  the  scalar  sensors  used  by  the  earlier  described 
methods  because  they  allow  us  to  make  full  use  of 
the  EM  field  inner  characteristics  and  the  information 
therein.  Exploiting  directly  the  waveforms  enables  us 
to  use  the  phase  information  and  hence,  estimate  the 
DOA  for  coherent  sources.  Noticeable  is  that  determin¬ 
ing  the  DOA  for  coherent  sources  is  a  very  challenging 
task  for  spectral  methods. 
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Elliptically  polarized 
incident  wave 


Figure  1*  Field  observation 


three  field  components  Hyj  and  Ez.  Let  denote  by 
Miz  and  'ipiz  the  amplitude  ratios  and  phase  differences 
(with  Ez  as  a  reference)  defined  by 

M.  -  Hl  -  Al 

(7) 


By  simple  manipulations  on  Eqs.  (5)  and  (7)  we  get 
the  DOA  and  the  polarization  [10], 


tan  $ 
sin0 


^  ^  M. 


Mxz  sin  %jjj: 


sin  ‘4?xt 


{Myz  COS  i'y,  sin  $  -  M^z  cos  cos  $) 
{(AiAs  sin  4’xt  )^+t  Ay  At  sin  1 1/2 

i.  AlAlsm'^['tpxz-4>yt)  '*  ' 


-1 


(8) 


In  the  following  section  we  show  how  to  estimate  Hi{t) 
and  Ez{t). 


2  Measurement  Model 

In  this  section  we  study  the  measurement  model 
used  in  this  approach  and  consequently  derive  the  DOA 
and  polarization  of  the  incident  source.  Figure  1  shows 
the  configuration  of  the  measurement  model  in  which 
a  downgoing  wave  makes  an  incident  angle  9  with  the 
vertical  direction  and  an  azimuthal  angle  ^  with  the 
East.  In  the  figure,  and  E\\  indicate  the  magnetic 
and  electric  field  components  in  the  plane  of  incidence, 
while  H±,  and  Ex  those  perpendicular  to  it.  The  EM 
field  components  at  the  point  P  whose  height  is  small 
compared  with  a  wavelength  are  given  by  the  following 
equations. 

Hj,  =  2Aie^"'*{(cos$  “  srcos^sin^)  H“ii/cos0sin<^} 

=  Ax  cos(ti;t  +  \l^x) 

Hy  =  ^  A  xcos  9  cos  #)  +  jy  cos  ^  cos  <!>} 

=  Ay  cos(tt;<  +  V’t/) 

Ez^  -2Axe>^*sin6> 

=  Az  cos(tt;<). 

(5) 

In  the  above  notations  Hx  =  Axcosti;^  and  i7|j  — 

Ajj  cos{wt  -j-  a),  {j  —  The  wave  polarization 

is  defined  by 

P  =  l!^  =  ^exp(ia)  (g) 

=  a:  +  jy- 

As  we  can  see  from  the  above  equations,  it  is  easy 
to  calculate  the  DOA  and  polarization  of  the  inci¬ 
dent  wave  if  we  can  determine  the  waveforms  of  the 


3  Blind  Separation  of  the  EM  Field 
Components 

3.1  Blind  separation  of  sources 

We  consider  the  output  of  an  array  of  sensors  and 
suppose  that  the  output  signals  are  the  sum  of  many 
independent  but  completely  unknown  original  source 
signals  as  expressed  by  Eq.(2).  The  blind  separation  of 
the  original  signals  consists  in  their  recovery  from  the 
observation  of  a  linear  mixture  observed  at  the  array 
output.  Note  that  the  independence  of  sources  is  the 
only  a  priori  information  available. 

Let  denote  by  x(t)  the  solution  to  the  problem;  that 
is,  the  estimated  sources.  Since  this  solution  is  made 
upon  a  test  of  the  statistical  independence  (all  sources 
are  mutually  independent),  any  scaled  permutation  of 
these  sources  will  satisfy  this  test.  This  leads  to  an 
indeterminacy  in  the  solution  in  both  its  scale  and  its 
arrangement  and  can  be  expressed  by 

;c(i)  =  rPs(t).  (10) 

r  is  a  diagonal  scaling  matrix  and  P  is  any  permuta¬ 
tion  matrix.  x{t)  and  s{t)  are  the  estimated  and  orig¬ 
inal  signals  respectively.  We  will  show  later  that  even 
though  it  is  serious  in  blind  separation  theory,  this  in¬ 
determinacy  is  not  harmful  in  the  present  application. 

The  problem  of  separation  was  first  addressed  by  J. 
Herault  and  C.  Jutten  who  proposed  a  solution  based 
on  neural  computing  [7,  11]  as  shown  in  Fig.  2.  'Ihe 


312 


Figure  2.  Architecture  of  the  neural  so¬ 
lution. 


estimated  solution  x  in  the  figure  is  a  weighted  sum 
of  the  sensors’  output  y  as  given  by  Eq.  (2).  It  has 
been  shown  theoretically  that  the  architecture  of  such 
a  network  allows  us  to  separate  the  unknown  sources. 
Practically,  this  requires  some  additional  information 
in  order  to  bring  the  network  to  the  optimal  solution. 
In  the  present  case,  the  only  available  a  priori  informa¬ 
tion  is  that  the  original  sources  Sj  are  statistically  in¬ 
dependent.  Therefore  after  the  separation,  the  output 
signals  Xi  must  be  independent  as  well.  For  simplicity 
of  our  explanation  let  us  assume  at  the  beginning  that 
the  average  values  of  the  input  signals  are  zero,  i.e. 

=  0  and  Em^^i)  =  0,  i  =  1 . /n, 

where  Em{-)  is  the  statistical  mean.  Two  signals  are 
statistically  independent  are  at  least  decorrelated  (zero 
covariance),  i.e.  the  expected  values  of  their  products 
are  null;  that  is, 

EM{siSj)  —  ^  and  =  0.  (12) 

However,  the  zero  covariance  of  Eq.(12)  is  not  a  suf¬ 
ficient  condition  for  the  statistical  independence.  A 
much  stronger  requirement  for  the  statistical  indepen¬ 
dence  is, 

=  0  and  a^)  =  0,  (13) 

for  any  odd  integer  value  of  and  p.  On  the  basis  of  this 
requirement  Herault  and  Jutten  proposed  a  learning 


rule  of  the  following  form 

(14) 

where  /i  >  0  is  a  positive  learning  rate  and  /()  and  g{) 
are  two  odd  nonlinear  functions.  Thus,  the  weights  are 
updated  iteratively  by  the  HJ  algorithm  starting  from 
randomly  chosen  small  values. 

3.2  Separation  of  the  EM  field  compo¬ 
nents 


We  apply  the  previous  algorithm  to  separate  each  of 
the  field  components.  If  we  consider  the  case  of  three 
components  and  the  case  of  two  sources,  we  have  a 
set  of  three  pairs  of  signals:  (E'j,E'^),  {Hi,  Hi),  and 
(i/y,/fy).  (.)^  are  the  components  of  the  first  source 
and  (.)^  are  those  of  the  second  source.  Figure  3  shows 
the  separation  diagram  for  three  components  field  and 
two  sources.  In  order  to  separate  two  sources  (m  2), 


Figure  3.  Diagram  of  DOA  finder  based 
on  blind  separation. 


we  need  to  set  at  least  two  antennae  (n  =  2)  at  a 
distance  D  of  each  other  and  measure  the  three  com¬ 
ponents  of  the  EM  field.  The  output  of  the  sensors  is 
given  by  Eq.(2)  for  each  component.  If  we  denote  by  s 
any  of  these  components  {Hj>,Hy,Ez),  the  output  y  of 
its  corresponding  sensors  are  given  by 

yi  =  fliiSi  +  +  Pi 

y2  =  +  <^22^2  +  P2)  (15) 
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where  r*  is  given  by  Eq.(4)  and  r)  is  an  additive  noise. 
By  analogy  to  Eq.(2),  the  mixing  matrix  is  of  the  form 

(16) 

To  solve  the  system  of  Eq,(15)  we  need  to  have  the  co¬ 
efficients  ttij  at  least  to  be  constant  over  the  estimation 
time.  However,  if  the  incident  angle  is  constant  over 
the  separation  time  and  for  a  given  frequency  w,  the 
term  is  a  constant  complex.  In  other  words,  the 

incident  angle  should  be  constant  over  the  separation 
time.  If  this  requirement  is  met  and  in  the  presence  of 
an  additive  noise  we  get  the  system  of  Eq.(2)  and  we 
can  adopt  the  HJ  algorithm  that  requires  a  constant 
matrix  A, 


A  = 


On 
021^ 


ai2e 


^22 


4  simulation 

In  this  section  we  show  some  results  of  computer 
simulation.  We  consider  the  estimation  of  the  DOA 
and  polarization  for  two  sources,  and  the  sources  are 
assumed  to  be  elliptically  polarized  and  to  have  differ¬ 
ent  incident  and  azimuthal  angles,  0  and  (j).  Figure  4 


TIMF 

Figure  4,  Blind  estimation  of  DOA  and 
polarization  for  two  coherent  sources. 


shows  the  estimated  results  of  DOA  and  polarization 
for  two  sources  using  the  proposed  algorithm.  The 
upper  and  middle  boxes  show  the  incident  and  az¬ 
imuth  angles,  0  and  ^  respectively,  while  the  lower 
boxes  show  the  polarization  for  the  two  sources.  In 


Without  noise 

With  noise 

Assum. 

(Deg.) 

Estim. 

(Deg.) 

(lo"-^) 

Estim. 

(Mean) 

a 

(Deg.) 

0i 

40 

39.90 

1.0 

40.23 

1.66 

02 

-15 

-15.0 

1.0 

-14.7 

1.96 

60 

59.90 

1.0 

59.38 

2.62 

$2 

30 

29.90  ^ 

1.0 

34.0 

12.75 

Pi 

0,1 

0,.99 

1.0 

.014,1.0 

.08,.07 

P2 

.5,  .5 

.5,-5 

1.2 

.49, .51 

.14, .09 

Table  1.  Summary  of  estimation  re¬ 
sults. 


the  two  upper  boxes,  the  solid  and  dashed  lines  stand 
for  source  I  and  II,  respectively.  The  source  I  is  as¬ 
sumed  to  be  circularly  right-handed  polarized  (pi  =  j) 
and  its  assumed  DOA  is  (^i  =  40^, =  60®),  while 
the  source  II  is  elliptically  polarized  with  p2  =  -(1+ j) 
and  its  DOA  is  {02  -  -15®,  ^2  =  30®).  We  first  con¬ 
sider  the  estimation  case  for  coherent  sources,  which 
is  a  rather  difficult  and  challenging  task  especially 
for  spectral  methods  where  the  the  frequency  of  the 
source  is  the  only  available  information.  This  ap¬ 
proach  overcomes  this  difficulty  by  benefiting  from  the 
phase  information  calculated  directly  from  the  recon¬ 
structed  waveforms.  The  DOA’s  in  a  noiseless  condi¬ 
tions  are  estimated  to  be  {0\  =  39.9®,  =  59.9®)  and 

^^2  —  — 15.0®,62  —  29-9°)  for  the  sources  I  and  II  re¬ 
spectively  with  an  error  of  almost  1%  as  it  is  shown 
in  Table  4.  The  mean  polarization  for  each  source  is 
Pi  =  (0.99  ±  0.024)j  and  p2  =  0.5  +  0.5j  ±0.012.  The 
two  cells  network  is  seen  to  converge  after  158m5  as 
shown  in  Fig.  4.  Table  4  summarizes  the  estimation 
results  for  two  coherent  sources.  We  also  considered 
the  case  where  the  DOA  is  slightly  varying  with  time; 
that  is,  the  incident  and/or  azimuthal  angle  is  not  fixed 
vis-a-vis  to  the  sensor.  This  happens  in  some  applica¬ 
tions  when  the  transmitter  is  moving.  The  upper  box  in 
Fig.  5  indicates  the  estimated  incident  angles  0i  and  02 
where  the  original  or  assumed  angle  0i  is  varying  sinu¬ 
soidally  with  time,  and  is  plotted  in  dashed  lines.  The 
comparison  of  the  curves  in  the  top  of  Fig.  5  indicates 
the  estimation  is  delayed  with  the  original  0i  by  40  ms, 
which  is  partly  due  to  the  presence  of  filters  and  other 
processing  units  that  induce  small  delays.  We  also  in¬ 
vestigated  the  case  of  varying  polarization  and  the  case 
where  the  two  source  have  vey  close  polarizations.  We 
have  mentioned  in  our  theoretical  study  that  the  so¬ 
lution  given  by  the  HJ  algorithm  is  the  subject  to  an 
indeterminacy  on  the  scale  and  the  permutation  order. 
For  this  application,  it  is  easy  to  distinguish  the  two 
sources  by  using  the  inner  statistical  properties  of  the 
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Figure  5,  The  estimation  of  incident 
and  azimuthal  angles  and  the  polariza¬ 
tion  of  the  two  sources  when  the  in¬ 
cident  angle  0i  is  varying  sinusoidally 
with  time. 


field  components.  As  a  matter  of  fact,  it  suffices  to  run 
a  correlation  test  on  two  components  to  separate  the 
components  of  one  source.  While,  for  the  scale  inde¬ 
terminacy,  the  simulation  results  have  shown  that  this 
scale  factor  is  almost  equal  for  all  components  of  one 
field,  and  because  we  use  only  the  amplitude  ratios,  this 
scale  factor  does  not  have  any  effect  on  the  estimation 
results.  The  other  plausible  result  is  concerned  with 
the  polarization.  So  far,  most  of  the  approaches  con¬ 
cerning  the  estimation  of  the  DOA  of  polarized  electro¬ 
magnetic  waves  have  been  imposing  some  specific  as¬ 
sumptions  on  the  source  polarization  in  order  to  obtain 
an  explicit  theoretical  form  for  the  DOA  [12,  13].  The 
present  approach  is  found  to  be  advantageous  over  the 
previous  studies,  since  we  can  get  all  the  information 
from  the  waveform,  and  we  did  not  need  any  additional 
assumption  on  the  source  polarization. 
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Abstract 

Most  algebraic  methods  for  Independent  Component 
Analysis  (ICA )  consist  of  a  second-order  and  a  higher-order 
stage.  The  former  can  be  considered  as  a  classical  Princi¬ 
pal  Component  Analysis  (PCA),  with  a  three-fold  goal:  (a) 
reduction  of  the  parameter  set  of  unknowns  to  the  manifold 
of  orthogonal  matrices,  (b)  standardization  of  the  unknown 
source  signals  to  mutually  uncorrelated  unit-variance  sig¬ 
nals,  and  (c)  determination  of  the  number  of  sources.  In 
the  higher-order  stage  the  remaining  unknown  orthogonal 
factor  is  determined  by  imposing  statistical  independence 
on  the  source  estimates.  Like  all  correlation-based  tech¬ 
niques,  this  set-up  has  the  disadvantage  that  it  is  affected 
by  additive  Gaussian  noise.  However  it  is  possible  to  solve 
the  problem,  in  a  way  that  is  conceptually  blind  to  addi¬ 
tive  Gaussian  noise,  by  resorting  only  to  higher-order  cu- 
mulants.  The  purpose  of  this  paper  is  to  explain  how  the  di¬ 
mensionality  of  the  ICA-model  can  algebraically  be  reduced 
to  the  true  number  of  sources  in  higher-order-only  schemes. 


1.  Introduction 

In  this  paper,  the  basic  ICA-model  will  be  denoted  as 
follows: 

y  =  MX-)-iV  (1) 

in  which  the  observed  vector  Y,  the  source  vector  X  and  the 
noise  vector  N  are  zero-mean  random  vectors  with  values 
in  M  or  C.  The  components  of  X  are  assumed  to  be  mu¬ 
tually  statistically  independent,  as  well  as  statistically  inde¬ 
pendent  from  the  noise  components;  the  mixing  matrix  M 
has  linearly  independent  columns.  As  is  well-known,  the 
goal  of  ICA  consists  of  the  estimation  of  the  mixing  matrix 
and  the  corresponding  realizations  of  X ,  given  only  realiza¬ 
tions  of  y. 

ICA-algorithms  often  have  the  form  of  a  two-stage  pro¬ 
cedure.  First,  in  a  prewhitening  step,  the  second-order 
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statistics  of  the  observation  vector  Y  are  used  to  “stan¬ 
dardize”  the  problem:  Y  is  transformed  in  a  vector  Y  with 
unit  covariance.  In  the  more-sensors-than-sources  case  this 
also  implies  a  projection  of  Y  onto  the  dominant  subspace 
of  the  data.  The  standardization  step  allows  to  restrict 
the  unknown  mixing  matrix  to  the  manifold  of  orthogo¬ 
nal  matrices.  In  the  second  step,  the  orthogonal  factor  ^s 
then  obtained  from  a  higher-order  cumulant  tensor  of  Y . 
Examples  of  algebraic  prewhitening-based  ICA-methods 
are  [4,  3,  7,  10]. 

On  the  other  hand,  it  is  also  possible  to  identify  the  mix¬ 
ing  matrix  by  using  only  the  higher-order  cumulant  tensor. 
For  algebraic  techniques,  we  refer  to  [2,  1,  5,  11].  Such 
higher-order-only  methods  have  the  interesting  feature  that 
they  allow  to  boost  signal-to-noise  ratios  when  the  noise 
is  Gaussian.  Although  there  is  no  prewhitening  stage,  one 
may  still  want  to  reduce  the  dimensionality  of  the  higher- 
order  cumulant  in  the  more-sensors-than-sources  case,  as 
this  may  significantly  reduce  the  computational  efforts  in 
the  actual  algorithm;  In  this  paper,  we  will  investigate  how 
the  dimensionality  reduction  can  be  achieved,  from  a  mul¬ 
tilinear  algebraic  point  of  view. 

In  Section  2,  we  will  introduce  some  necessary  back¬ 
ground  material  on  multilinear  algebra.  In  Section  3  the  ac¬ 
tual  core  of  our  contribution  is  addressed:  where  the  prob¬ 
lem  of  dimensionality  reduction  in  PCA  can  essentially  be 
formulated  as  the  best  rank-i?  approximation  of  a  given  ma¬ 
trix,  the  higher-order-only  problem  leads  to  a  clear  multi¬ 
linear  equivalent,  denoted  as  the  best  rank-(i?,  R,...  ,R) 
approximation  (in  least-squares  sense)  of  the  given  higher- 
order  cumulant  tensor.  In  Section  4  we  discuss  if  and  how 
the  approximation  problem  can  be  solved  by  a  tensorial 
generalization  of  truncating  the  Eigenvalue  Decomposition 
(EVD)  or  Singular  Value  Decomposition  (SVD).  Section  5 
deals  with  the  local  minimisation  of  the  quadratic  error  cri¬ 
terion  by  an  Alternating  Least  Squares  (ALS)  algorithm 
that  can  be  interpreted  as  a  higher-order  generalization  of 
the  technique  of  orthogonal  iterations  [12].  Quite  some  re¬ 
search  on  related  topics  has  been  done  in  the  field  of  Psy¬ 
chometrics;  the  main  ideas  behind  Sections  4  and  5  can 


also  be  found  in  [13].  In  Section  6  we  give  an  explicit  ex¬ 
pression  for  the  gradient  of  the  objective  function,  for  use 
in  gradient-based  optimization  routines.  Section  7  summa¬ 
rizes  the  conclusions. 

Section  5  forms  a  generalisation  of  a  previous  paper  [9], 
on  the  best  rank-1  approximation  of  a  given  tensor. 

For  simplicity  of  notation,  our  discussion  is  restricted  to 
the  case  of  real- valued  data.  The  results  are  readily  gener¬ 
alized  for  complex  tensors. 

2.  Basic  definitions  and  notations 

Multilinear  algebra  is  the  algebra  of  higher-order  tensors. 
For  the  purpose  of  this  paper,  higher-order  tensors  can  in¬ 
tuitively  be  imagined  as  multi-indexed  arrays  of  numerical 
values.  The  tensors  of  interest  in  HOS  (higher-order  mo¬ 
ments,  cumulants,  ...  )  are  super-symmetric:  in  the  real¬ 
valued  case,  this  means  that  they  remain  unchanged  under 
arbitrary  index  permutations.  We  subsequently  touch  on 
(a)  some  rank-related  definitions,  (b)  the  multiplication  of  a 
tensor  by  a  matrix,  (c)  a  standardized  representation  in  ma¬ 
trix  format,  and  (d)  some  geometry-related  concepts,  such 
as  scalar  product,  orthogonality  and  norm. 

Definition  1  (Rank-1  tensor)  An  Nth-order  tensor  A  has 
rank  1  when  it  equals  the  outer  product  of  N  vectors  C/,  V, 
...Z; 

A=^UoVo,..oZ  (2) 

i.e,  Ai-^^i2.,.iN  for  all  index  values. 

Definition  2  (Rank)  The  rank  of  an  arbitrary  Nth-order 
tensor  A,  denoted  by  r  =  rank(^),  is  the  minimal  num¬ 
ber  of  rank-1  tensors  that  yield  A  in  a  linear  combination. 

Definition  3  (n-rank)  The  n-rank  of  A  E 
denoted  by  is  defined  as  the  dimension  of  the  vector 
space  generated  by  the  Iih  . . .  /n-i/n+i  -  -  In  vectors, 
obtained  by  varying  the  index  in  of  Ai^i2..AN  vvhile  keep¬ 
ing  the  other  indices  fixed. 

The  definition  of  n-rank  generalizes  the  definitions  of 
column  and  row  rank  of  matrices.  Def.  2  gives  an  other 
way  to  extend  the  concept  of  rank  to  higher-order  tensors. 
In  contrast  to  the  matrix  case,  the  different  n-ranks  of  a 
higher-order  tensor  are  not  necessarily  the  same;  and  when 
they  are  equal  (e.g.  as  a  result  of  super-symmetry),  they  can 
still  be  different  from  the  rank  of  the  tensor.  An  iVth  order 
tensor  with  n-ranks  ri,  r2,  . . .  ,  rjv,  will  be  denoted  as  a 
rank-(ri,  r2, . . .  , tn)  tensor.  An  interesting  discussion  of 
rank-related  issues  for  super-symmetric  tensors,  with  bibli¬ 
ographic  pointers  to  the  early  literature,  can  be  found  in  [5]. 

The  product  of  two  matrices  can  be  generalized  to  the 
product  of  a  tensor  and  a  matrix  as  follows: 


Definition  4  The  n-mode  product  of  a  tensor  A  E 
]^/i  x/2x...x/iv  ^  matrix  U  E  denoted  by 

.A  x„  U,  is  an  {Ii  x  I2  x  .,.Jn^**  x  I^ytensor  given 
by 

i^A  Xn  U)iii2...in--*iV  ~  3^  Qji  i2  ♦  ■  .in » ♦  '^jn  in 
for  all  index  values. 

The  n-mode  product  allows  us  to  express  the  effect  of 
basis  transformations  on  a  given  tensor.  As  an  exanlple,  for 
the  matrices  F  E  U  E  V  E  the 

matrix  product  U  •  F  •  can  be  written  as  the  “symmetric” 
expression  F  Xi  U  X2  V. 

It  is  often  more  familiar  to  formulate  tensor  expressions 
in  a  matrix  language.  We  define  the  following  standard  ma¬ 
trix  representation  of  a  higher-order  tensor: 

Definitions  The  mode-n  matrix  unfolding  of 
A  E  ]^/i  x/2x.../n^  denoted  by  A(„),  is  the 
{In  X  I1I2  . . .  /n~i7n+i  •  •  •  lN)-tnatrix  that  contains 
the  element  position  given  by  row  number 

in  and  column  number  {ii  —  1)12/3  •  •  •  /n-i/n+i .  ♦ .  /at  + 

(^2  —  l)/3  •  •  •  /n-l/n+l  .  .  .  /at  -f  .  .  .  -f“  ^AT. 

Remark  that  the  n-mode  product  B  =  ^  x„  U  can  be 
translated  as  =  U  •  A(„),  and  that  the  rank  of  A(^) 
corresponds  to  the  n-rank  of  A,  even  in  a  numerical  sense. 
For  super- symmetric  tensors,  the  various  matrix  unfoldings 
are  equal,  and  will  shortly  be  denoted  as  A. 

The  definitions  of  scalar  product,  orthogonality  and 
Frobenius-norm  can  be  generalized  in  a  trivial  way: 

Definition  6  The  scalar  product  {A^  B)  of  two  tensors 
A,B  ^  ]^/i  x/2x.../;v  is  defined  as 

(A,  s)  x; E  ■  •  ■  E (3) 

ii  i2  iN 

Definition  7  Tensors  of  which  the  scalar  product  equals  0, 
are  mutually  orthogonal. 

Definition  8  The  Frobenius-norm  of  a  tensor  A  is  given  by 

Mil  1^'  ^/MM)  (4) 

The  Frobenius-norm  can  be  interpreted  as  the  “size”  of 
the  tensor.  The  square  of  this  norm  can  be  seen  as  the  “en¬ 
ergy”  in  the  tensor. 

3.  Dimensionality  reduction  in  ICA  with  and 
without  prewhitening 

Let  us  first  remind  some  more  formal  aspects  of  the  pro¬ 
cedure  of  prewhitening.  The  ICA-model  implies  that  the 
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covariance  C2  of  Y  is  given  by 

C^=C^XiMx2M  +  C^  =  MCfM^  +  C^  (5) 

in  which  and  Cf  symbolize  the  covariance  of  X  resp. 
N.  is  diagonal,  since  the  source  signals  are  uncorre¬ 
lated.  We  can  additionally  assume  that  the  source  signals 
have  unit  variance,  which  at  most  involves  an  irrelevant 
rescaling  of  the  columns  of  M.  As  a  consequence,  a  first 
estimate  of  M,  up  to  an  orthogonal  factor  V ,  can  be  ob¬ 
tained  in  the  form  of  a  square  root  F  of  Cl': 

cl'  =  FF’’  =  (FV’’)(FV^)^  c:;  MM’’  (6) 

Substitution  of  the  Singular  Value  Decomposition  (SVD) 
M  =  U  ■  S  •  V’’  of  the  mixing  matrix,  leads  to  a  classical 

PCA 

Cl'  =  US’U’’  +  C^  =  (US)  (US)’’  -1-  C^  (7) 

in  which  the  left  singular  matrix  U  and  the  matrix  of  sin¬ 
gular  values  S  are  estimated  with  an  EVD  of  the  observed 
covariance  (or,  numerically  preferable,  the  direct  SVD  of 
the  dataset),  while  the  right  singular  matrix  V  remains  un¬ 
known.  Let  us  denote  the  dimension  of  V  by  7  and  the 
number  of  sources  by  R.  If  I  >  R,  then  the  dominant  R- 
dimensional  eigenspace  of  C|'  is  retained.  If  the  noise  is 
known  to  be  spatially  white,  then  its  variance  cr^  can  be  es¬ 
timated  as  the  mean  of  the  smallest  I  —  R  eigenvalues  of 

Cl". 

On  the  other  hand,  the  fourth-order  cumulant  Cj  of  Y 
can  be  expressed  in  a  similar  way  as  the  covariance  in 
Eq.  (5): 

C2'=Cf  XiMx2Mx3Mx4M  +  Cf  (8) 

In  this  equation,  Cf  and  symbolize  the  fourth-order  cu¬ 
mulant  tensors  of  X  resp.  N;  the  latter  vanishes  if  the  noise 
is  Gaussian.  Like  C^,  Cf  is  diagonal,  as  the  source  signals 
are  statistically  independent.  Due  to  the  diagonality  prop¬ 
erty,  Eq.  (8)  can  be  rewritten  as 

R 

C]'  ~  ^  H^Mr  oMrOMrOMr  (9) 

r=l 

in  which  represents  the  marginal  cumulant  of  the  rth 
source  (we  assume  that  the  sources  are  kurtic),  and  in  which 
Mr  denotes  the  rth  column  of  M.  Eq.  (9)  is  a  decomposition 
of  CY  in  a  minimal  number  of  rank-1  terms,  as  the  columns 
of  M  are  assumed  to  be  linearly  independent;  it  can  even 
be  proved  that  the  decomposition  is  essentially  unique,  up 
to  some  trivial  indeterminacies  [14].  As  a  consequence,  the 
aim  of  higher-order-only  ICA  can  be  formulated  as  the  com¬ 
putation  of  a  rank-revealing  decomposition  of  CX .  taking 
into  account  that  the  sample  cumulant  equivalent  of  Eq.  (9) 


may  be  perturbated  by  non-Gaussian  noise  components,  fi¬ 
nite  datalength  effects,  model  misfit,  etc. 

In  addition,  according  to  the  definition  of  n-rank,  ev¬ 
ery  tensor  satisfying  the  assumptions  of  Eq.  (9)  is  a  rank- 
{R,R,R,R)  tensor.  To  deal  with  the  situation  in  which 
7  >  J?,  we  propose  to  perform  an  orthogonal  projec¬ 
tion  of  the  sample  cumulant  CX  on  the  manifold  of  super- 
symmetric  rank-(i?,  R,  R,  R)  tensors,  before  addressing  the 
harder  problem  of  the  further  projection  on  the  submanifold 
of  rank- J?  tensors  and  the  actual  computation  of  decompo¬ 
sition  (9).  Notice  that  in  the  second-order  case  both  projec¬ 
tions  coincide,  as  n-rank  and  rank  are  necessarily  the  same, 
which  corresponds  to  the  single  best  rank-i?  approximation 
problem,  associated  with  Eq.  (7). 

Formally,  the  problem  we  want  to  solve,  can  be  formu¬ 
lated  as  follows: 

Given  a  sample  cumulant  A  £  find  a  super- 

symmetric  tensor  A  £  with  ri  =  r2  =  1^3  = 

n  -  R,  that  minimizes  the  least-squares  criterion 

fiA)  =  \\A-Af  (10) 

The  conditions  imply  that  A  can  be  decomposed  as: 

A  =  BXiUX2UX3UX4U  (11) 

in  which  U  £  E’^’*  has  orthonormal  columns  and  B  £ 
^RxRxRxR  jj  super-symmetric.  After  computation  of 
the  best  rank-(i?,  R,  R,  R)  approximation  the  actual  ICA- 
algorithm  can  be  applied  to  B  instead  of  A‘,  B  can  be  inter¬ 
preted  as  the  cumulant  tensor  of  the  E-dimensional  stochas¬ 
tic  vector  Z  =  XJ^Y. 

4.  Best  rank-(i?,  R,  R,  R)  approximation  and 
Higher-Order  Eigenvalue  Decomposition 

In  this  section  we  discuss  the  tensorial  equivalent  of  the 
computation  of  the  best  rank-JJ  approximation  of  a  sym¬ 
metric  matrix  by  truncation  of  its  symmetric  EVD.  First, 
the  decomposition  itself  can  be  generalized  as  follows: 

Theorem  1  Every  super- symmetric  Nth  order  tensor  A  £ 
]^/x/x...  X/  fjg  -wriften  as 

4  =  «SxiU  X2U...  xnU  (12) 

in  which: 

•  u  =  [Ui172  . . .  Ui]  is  an  orthogonal  (7  x  I)-matrix 

•  the  core  tensor  S  is  a  super-symmetric  {I  x  I  X...I)- 
tensor  of  which  the  subtensors  Si„=a,  obtained  by  fix¬ 
ing  one  arbitrary  index  to  a,  have  the  properties  of: 
-  all-orthogonality:  two  subtensors  <Si„=a  ond  Si„=0 


318 


are  orthogonal  for  all  possible  values  of  n,  a  and  (3 
subject  to  a  ^  (3: 

(‘5i„=a,«5i„=^)  =  0  when  (13) 

-  ordering: 

II  >  ||5i„=2||  >  . . .  >  ||<Si„=,„ II  >  0  (14) 

and 

||5,„=.„+l||=...-||5i„=/||=:0  (15) 

The  Frobenius-norms  ||<Si„=i||,  symbolized  by  si,  are  the 
higher-order  eigenvalues  of  A  and  the  vector  Ui  is  the  ith 
higher-order  eigenvector. 

This  Higher-Order  Eigenvalue  Decomposition  (HOEVD) 
is  the.  super-symmetric  case  of  the  Higher-Order  Singular 
Value  Decomposition  (HOSVD),  discussed  in  [8],  Clearly, 
the  HOEVD  is  a  formal  generalization  of  the  EVD  of  sym¬ 
metric  matrices.  Moreover,  it  can  be  proved  that  the  HO¬ 
EVD  of  a  second-order  symmetric  tensor  essentially  boils 
down  to  its  matrix  EVD. 

The  matrix  U  can  be  computed  as  the  left  singular  matrix 
of  the  matrix  unfolding  A.  The  core  tensor  S  can  then  be 
found  from  the  following  form  of  Eq.  (12): 

5  =  >1xiU^X2U^...xjvU^  (16) 

The  higher-order  eigenvalues  correspond  to  the  singular 
values  of  A;  as  such,  they  give  numerical  information  on 
the  n-rank  of  A.  If  Eq.  (9)  is  exactly  satisfied,  then  the 
number  of  non-zero  higher-order  eigenvalues  corresponds 
to  the  number  of  kurtic  sources.  In  practice,  the  number  of 
sources  will  be  estimated  by  determining  the  hypothesized 
gap  between  “signal”  and  “noise”  higher-order  eigenvalues. 

The  properties  of  the  higher-order  eigenvectors  and 
eigenvalues,  together  with  the  ordering  convention,  suggest 
that  setting  sr^i  =  sr+2  =  s^v  =  0,  will  result  in  a 
good  rank-(i?,  R,  R,  R)  approximation  of  A.  Remarkably 
enough,  it  turns  out  that  this  approximation  is  in  general  not 
the  globally  optimal  one.  A  second  important  difference 
with  matrices,  is  that  the  criterion  function  f{A)  can  ex¬ 
hibit  spurious  local  optima.  On  the  other  hand,  we  have  ob¬ 
served  that  the  HOEVD-approximation  generally  belongs 
to  the  valley  of  /  that  contains  the  global  optimum.  There 
is  not  an  absolute  guarantee:  we  have  been  able  to  gener¬ 
ate  ill-conditioned  cases  in  which  gradient-descent,  starting 
from  the  HOEVD-truncate,  eventually  leads  to  a  local  opti¬ 
mum  with  a  close  to  optimal  fit,  but  nevertheless  different 
from  the  global  optimum  [6].  In  practice  however,  no  prob¬ 
lems  of  this  kind  have  been  observed. 


5.  Higher-order  orthogonal  iterations 

In  this  section  we  briefly  discuss  an  ALS  algorithm  for 
the  local  optimisation  of  /,  that  can  be  interpreted  as  a 
higher-order  generalization  of  the  method  of  orthogonal  it¬ 
erations  [12]. 

First,  observe  that  the  minimisation  of  /  is  equivalent  to 
the  maximisation  of  the  following  function  g{TJ)  over  the 
matrices  with  mutually  orthonormal  columns: 

p(U)  =  \\A  xi  X2  X3  X4  U^||2  (17) 

Proving  this  relationship  falls  outside  the  scope  of  this  ar¬ 
ticle,  but  as  a  motivation  we  can  remark  that,  similarly,  the 
computation  of  the  best  rank-iZ  approximation  of  a  symmet¬ 
ric  matrix  H  is  equivalent  to  the  determination  of  a  column¬ 
wise  orthonormal  matrix  U  that  maximizes  the  Frobenius- 
norm  ofU-H-U^. 

Next,  let  us  embed  the  maximisation  of  g  in  the  max¬ 
imisation,  over  the  matrices  with  or¬ 

thonormal  columns,  of  the  function  g,  defined  as: 

ff(uW,...  ,U(^))  =  PxiUW^...X4U(4)^f  (18) 

Because  of  the  super-symmetry  of  A,  the  global  optimum 
will  satisfy  =  . . ,  = 

Now  imagine  that  e.g.  the  matrices  and 

are  fixed  and  that  g  in  Eq.  (18)  is  merely  a  quadratic  ex¬ 
pression  in  the  components  of  the  unknown  matrix  .  In 
matrix  notation,  we  have: 

9  =  l|U(^)^  •  [A(1)  •  (U(2)  (8,  U<3)  ®  U(^))]||2  (19) 

in  which  0  denotes  the  Kronecker  product.  Hence  the 
columns  of  can  be  found  as  an  orthonormal  basis  for 
the  dominant  subspace  of  the  column  space  of  the  matrix 
between  square  brackets.  Repeating  this  procedure  for  dif¬ 
ferent  mode  numbers  leads  to  an  ALS  algorithm  for  the  (lo¬ 
cal)  maximization  of  f{A):  in  each  step  the  estimate  of  one 
of  the  matrices  ,  -  •  •  ,  is  optimized,  while  the  other 

matrix  estimates  are  kept  constant. 

Clearly  this  technique  is  a  higher-order  extension  of  the 
orthogonal  iteration  method  for  matrices.  A  major  differ¬ 
ence  is  that  each  iteration  step  does  not  involve  a  linear  but 
a  multilinear  transformation.  A  second  important  difference 
is  that  an  iteration  step  also  implies  the  estimation  of  a  dom¬ 
inant  subspace,  instead  of  a  mere  orthonormalisation.  It  is 
inherent  to  the  ALS-mechanism  that  higher-order  orthogo¬ 
nal  iterations  yield  unsymmetric  intermediate  results. 

6.  Gradient-based  optimisation 

As  an  alternative  for  ALS-iterations,  one  could  consider 
the  use  of  more  general-purpose  optimisation  routines,  in 
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which  the  unknown  matrix  U  is  situated  on  the  Stiefel  man¬ 
ifold  of  column-wise  orthonormal  (J  x  i?)-matrices.  The 
symmetry  can  be  retained  during  iteration  by  simply  treat¬ 
ing  all  the  modes  in  the  same  way.  For  the  optimisation 
procedure  it  is  possible  to  exploit  knowledge  of  the  gradi¬ 
ent.  More  specifically,  the  gradient  of  g  over  the  Stiefel 
manifold  takes  the  following  form: 

Vug  =  8(I-UU^)-U-U^-U  (20) 

in  which 

U  =^A-(U(2)(8iU(®)  (21) 

Expression  (20)  consists  of  a  sum  of  4  equal  terms,  as  Eq. 
(17)  involves  the  matrix  U  4  times,  in  a  super-symmetric 
way.  For  each  of  the  terms,  the  symmetric  version  of 
Eq.  (19)  can  be  interpreted  as  a  classical  subspace  tracking 
problem,  with  fixed  matrix  U. 

7.  Conclusions 

In  an  ICA  set-up  with  more  observation  channels  than 
source  components,  the  number  of  sources  can  conceptu¬ 
ally  be  estimated  without  resorting  to  second-order  statis¬ 
tics.  The  corresponding  reduction  of  the  dimensionality 
of  the  problem  takes  the  form  of  a  multilinear  generaliza¬ 
tion  of  the  best  rank-i?  approximation  of  matrices.  A  first 
approximative  solution  can  be  obtained  by  truncation  of  a 
higher-order  EVD/SVD-type  decomposition,  in  much  the 
same  way  as  for  matrices.  The  determination  of  the  optimal 
result  generally  requires  some  additional  fine-tuning.  With 
this  respect,  we  have  paid  some  attention  to  an  ALS-type 
descent  technique  and  to  gradient-based  optimisation  over 
the  Stiefel  manifold. 
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Abstract 

A  new  method  of  source  separation  with  noisy 
observations  is  proposed  in  the  case  of  two  sensors.  Each 
observation  contains  a  mixture  of  two  signals  with  noise. 
The  objective  is  to  estimate  the  frequency  spectra  of  the 
linear  filters  that  combine  the  two  signals  in  the  data 
stream.  The  main  characteristic  of  the  method  is  to  take 
into  account  additive  noises.  No  hypotheses  on  their 
probability  densities  are  made.  We  derive  for  that  an 
original  objective  function,  based  on  nonlinear  functions 
of  the  observations.  Specific  properties  of  these  functions, 
chosen  as  exponential  functions,  and  the  hypothesis  of 
independent  sources  lead  to  a  direct  solution  for  the 
estimation  of  the  filters.  An  analytic  solution  may  be 
computed  from  it,  using  only  the  data.  The  convergence 
speed  of  the  method  and  its  robustness  against  non 
gaussian  noise  are  illustrated  in  the  paper  with  simulation 
results. 


1  Introduction 

In  this  paper,  source  separation  based  on  a  two-sensor 
scenario  is  considered.  A  general  model  for  this  scenario 
may  be  described  as  follows.  Consider  two  random  signals 
s(k)  and  r(k),  called  'sources’  hereafter.  Suppose  they  are 
propagating  through  a  deterministic  linear  medium  such 
that  we  receive  two  linear  combinations  x(k)  and  y(k)  on 
two  sensors.  Assuming  the  sources  are  statistically 
independent,  the  problem  consists  of  recovering  them 
from  observed  signals  only.  This  problem  is  generally 
called  source  sq)aration  or  blind  identification. 

It  is  also  supposed  that  sources  are  stationary,  non 
Gaussian  and  with  zero-mean.  A  general  model  for  the 
observations  is  given  in  the  frequency-domain  by : 

(1)  Xi(n)=  Si(n)  +  f(n)  Ri(n)  +  Vi(n) 

(2)  Yi(n)=  g(n)Si(n)  +  Ri(n)  +  WiCn) 

where  Si(n),  Ri(n),  Xi(n),  Yi(n),  Vi(n)  and  Wi(n)  are  the 
N-point  discrete  Fourier  transforms  (DFT)  of  the  i-th  data 
blocks  of  s(k),  r(k),  x(k),  y(k),  v(k)  and  w(k)  at  frequency 
bin  n.  Vi(n)  and  Wi(n)  are  assumed  to  be  independent 
additive  noises,  and  no  assumptions  are  made  regarding 
the  statistical  distribution.  On  the  contrary,  the  source 
separation  methods  assume  that  the  complex  sources  Si(n) 
and  Ri(n)  are  not  strictly  complex  normal.  The  model  (1) 
(2)  represents  either  instantaneous  mixtures  of  narrow- 
band  signals,  or  convolutive  mixtures  of  wide-band 
signals.  In  the  second  case,  f(n)  and  g(n)  are  the  complex 


weights  of  two  linear  filters  F  and  G,  at  frequency  bin  n. 
The  goal  of  source  separation  is  to  estimate  Si(n)  and 
Ri(n).  Because  of  the  presence  of  the  noises  Vi(n)  and 
Wi(n),  it  is  in  general  impossible  to  recover  exactly  the 

signal.  We  may  only  give  estimates  f  (n)  and  g(n)  of  f(n) 
andg(n). 

This  problem  of  source  separation  has  been  widely 
discussed  in  recent  years  and  has  been  used  for  instance  in 
radar  and  sonar  processing,  speech  enhancement, 
separation  of  rotating  machine  noises,  independent 
component  analysis  or  localization  in  array  processing. 

2  Source  separation  methods 

Recently  proposed  methods  for  source  separation  are  based 
on  the  hypothesis  of  source  independence  and  test  different 
measurements  of  the  statistical  independence.  As  it  has 
been  shown  that  the  second  order  statistics  are  not 
sufficient  to  give  a  solution,  information  contained  in  the 
higher  order  statistics  has  been  exploited.  Various 
solutions  relying  on  the  use  of  the  fourdi-order  moments 
or  cumulants  have  already  been  reported  (formulation  of 
contrast  functions,  maximization  of  normalized 
cumulants)  [1]  [2]  [3]  [4].  Other  approaches  make  use  of 
nonlinear  functions  and  also  make  higher  order  statistics 
appear  (including  for  example  the  information 
maximization  principle)  [5]  [6]. 

The  authors  generally  consider  a  model  with  no  additive 
noise  (case  of  adaptive  methods  or  neural  networks)  or 
with  gaussian  noises. 

We  propose  here  a  new  source  separation  method  which 
derives  from  an  original  objective  function,  based  on 
nonlinear  functions  of  the  observations.  Specific 
properties  of  these  functions,  chosen  as  exponential 
functions,  and  the  hypothesis  of  independent  sources  lead 
to  a  direct  solution  for  the  estimation  of  the  coefficients 
f(n),  g(n).  Proposed  expressions  are  deduced  from  the 
objective  function  without  any  hypotheses  on  the 
probability  density  of  the  noise  signals. 

3  A  new  method  of  source  separation 
with  noisy  observations 

Let  us  define  the  estimation  of  the  sources  as  two  linear 
combinations  of  Xi(n)  and  Yi(n) : 

(3)  X’i(n)  =  Xi(n)  -  a(n)  Yi(n) 

(4)  Y'i(n)  =  Yi(n)-p(n)Xi(n) 
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The  two  sources  are  separated  if  the  pair  (a(n),  P(n))  is 
equal  to  (f(n),g(n))  or  (l/g(n).l/f(n)).  The  approach  is  to 
formulate  a  two-dimensional  complex  function  F(a(n), 
B(n))  such  that  its  mimimum  value  is  obtained  for 
(f(n),g(n))  and  (l/g(n),l/f(n)),  and  Ais  value  is  imchanged, 
whatever  Ae  power  or  the  stetistical  distribution  of  Ae 
noises.  In  order  to  avoid  Ae  problems  generally  mherent 
to  adaptive  algoriAms  (local  minima,  influence  of  the 
mitialization,  low  convergence  speed),  we  search  a  direct 
computation  for  Ae  estimation  of  f(n)  and  g(n). 

We  define  first  nonlinear  functions  of  the  combinations 
X’i(n)  and  Y'i(n).  The  influence  of  the  noises  can  be 
eliminated  only  if  the  non  linear  functions  can  be 
separated  mto  a  product  of  functions  of  Si(n),  Ri(n),  Vi(n) 
or  Wi(n).  The  nonlinearity  is  necessarily  chosen  as  an 
exponential  complex  function,  for  example  and 

ejYi(n)  This  point  will  be  Ascuss  in  §3.4. 

3.1  Determination  of  a  family  of  non  linear 
functions 


Let  us  develop  Ae  following  non  linear  functions : 
E{X’i(n).Y’i(n)*.ei^‘(")},  E{X’i(n).Y’i(n)*.eiYi(n)}, 
where  *  represents  Ae  conjugate  of  a  complex  value. 

This  particular  function  can  be  separated  into  a  product  of 
functions  of  Si(n),  Ri(n)  and  Vi(n).  It  will  allow  us  to 
eliminate  firstly  Ae  terms  conteining  Ae  sources  Si(n), 
Ri(n),  and  seconAy  Ae  terms  containing  Ae  noise  signals 
Vi(n),Wi(n). 

Due  to  Ae  hypoAesis  of  mdependence  between  Ae  source 
and  noise  signals  Si(n),  Ri(n),  Vi(n)  and  Wi(n),  and  Ae 
exponential  non  linearity,  we  obtain  complex 
polynomials  of  second  degrees  in  a(n)  and  P(n).  The 
complex  coefficients  contain:  E{lSi(n)l^.eJ 
E{IRi(n)|2.ejRi(")),  E{IVi(n)|2.ejVi(n)}  and 
E{IWi(n)|2.ejWi(n)}. 

These  terms  can  be  computed  with  only  non  linear 
functions  of  Ae  observations  such  as : 

E{Xi(n)ejXi(n)},  E{Xi(n).Yi(n)*ejXi(n)},  E{ejXi(n)}, 
E{IXi(n)|2.ejXi(n)},  E{Yi(n)eiXi(n)}. ... 

It  allows  us  to  write  two  two-dimensional  complex 
functions  F(a(n),p(n)),  such  that  F(f(n),g(n))  and 
F((l/g(n),l/f(n))  depend  only  on  f(n),  g(n)  and  the 
observations.  More  details  are  explained  in  appendix. 

(5)  Fl(a(n),P(n))  =  J0(n)  +  Jl(n)  a(n)  +  J2(n)  a(n)P(n)* 
-I-  J3(n)  P(n)* 


where ; 


JO(n)=E{Xi(n)Yi(n)*ejXi(n)}-E{ejXi(n)}.E{Xi(n)Yi(n)*} 

Jl(n)=-E{IYi(n)|2ejXi(n)}-HE{ejXi(n)}£{IYi(n)|2) 


J2(n)=J0(n)* 

J3(n)=-E{IXi(n)|2eiXi(n))+ EleJ’XiCn)}  .E{IXi(n)r} 


The  second  function  F2(a(n),|i(n))  is  similar  to 
Fl(a(n),P(n)),  replacmg  ei^^W  wiA  ei^iC"). 

Each  function  F(a(n),p(n))  verifies : 

(f:\  F(f(n),  g(n)) - 

F(l/g(n).l/f(n))-®^  H 

It  also  can  be  written  Aat  Ae  paurs  (f(n),g(n))  and 
(l/g(n),l/f(n))  cancel  Ae  functions  G(a(n),P(n))  defined 
as : 

(7)G(a(n),p(n))=F(a(n),P(n))-a(n)*P(n)*F(l/P(n),l/a(n)) 

Previous  equation  (7)  has  an  infmite  number  of  solutions 
but  we  see  from  it  Aat  for  the  expected  solutions 
(f(n),g(n))  and  (l/g(n),l/f(n)),  Ae  ratio  (8) 

,  Mll)p(n) 


1+ 


JKn)' 


l-i-J0^n)!p(n)* 


is  equal  to: 


We 


Jl(n) 

l-f(n)g(n)  ,  l-f(n)g(n)  f(n)*g(n)  = 
l-f(n)*g(n)*  l-f(n)*g(n)*  f(n)g(n) 

remark  Aat  it  depends  only  on  Ae  values  of  f(n),  g(n)  and 
not  on  the  signals. 

Suppose  now  the  new  observations : 

(9)  NXi(n)=  XXi(n)  =  XSi(n) -H  f(n)  XRi(n)  +  XVi(n) 
NYi(n)  =  XYi(n)  =  g(n)  XSi(n)  -i-  XRi(n)  +  XWi(n) 
where  X  is  a  real  number  different  from  1. 

The  values  NJO(n)  and  NJl(n),  computed  from  equation 
(9)  with  NXi(n)  and  NYi(n),  are  Afferent  from  J0(n)  and 
Jl(n),  computed  wiA  Xi(n)  and  Yi(n).  As  previously,  Ae 
ratio  (8)  depends  only  on  f(n)  and  g(n)  and  not  on  Ae 
signals.  As  a  resAt,  we  obtain  : 

From  (10),  we  deduce  that  Ae  complex  solutions  P(n) 
(including  g(n)  and  l/f(n))  belong  to  a  circle  CO,  using  a 
geometrical  mterpretotion  of  Ae  complex  space. 

(11) 

J0(n)NJ0(n)*  J0(n)*NJ0(n)|g.  ,,2 
4l(n)  NJl(n)  Jl(n)  NJUn)*^ 

J0(n)_NJ0(n)]B(n)-H[^JQ^">*  J0(")-]P(n)*=0 

NJl(n)^P^  NJl(n)  Jl(n) 

Due  to  the  nonlinearity  of  the  exponential  function, 
NJ0(n)  and  NJl(n)  depend  on  X  and  are  not  propoAonal  to 
J0(n)  and  Jl(n).  It  defines  a  family  of  non  linear  functions 
Fl(a(n),p(n),X). 

CO  is  identified  using  (11).  J0(n)/Il(n)  and  NJ0(n)/NJl(n) 
are  computed  for  two  values  of  X  (for  example  A.=l  and 
X=0.5).  The  statistic^  moments  are  estimated  by  averages 
on  data. 
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3.2  Proof  of  an  analytical  solution 


Analyse  now  the  second  family  of  functions 
F2((x(n),P(n),X),  by  replacing  Xi(n)  with  Yi(n)  in  the 
exponential  term  (or  NXi(n)  with  NYi(n)). 

(12) 


F2(a(n),p(n))  =  K0(n)  +  Kl(n)  a(n)  +  K2(n)  a(n)p(n)*  + 
K3(n)p(n)* 


KO(n)=E{Xi(n)Yi(n)*ei(Yi(n))). 

E{ej(Yi(n))}JE{Xi(n)Yi(n)*} 

-E{Xi(n)ei(Yi(n))}E{ej(Yi(n))})-lE{Yi(n)*ei(Yi(n))) 
Kl(n)=-E{IYi(n)l^ej(Yi(n)))+  E{ej(Yi(n))}  .E{IYi(n)l^} 
*^‘^^f^E{Yi(nreR(Yi(n))+I(Yi(n))) 


K2(n)  =  K0(n)  * 

K3(n)=-E{IXi(n)l^ej(Yi(n))}+E{ej(Yi(n))).E{|Xi(n)|2) 


As  we  did  before,  we  compute  the  values  of  F2(a(n),p(n)) 
for  the  pairs  (f(n),g(n))  and  (l/g(n),l/f(n)) : 

(13)  F2(f(n).  g(n))  =  f(n).  M(Wi(n),Yi(n)) 
where  M  depends  on  noise  and  the  observations: 
M(Wi(n),Yi(n))=- 

E{Wi(n)*.ei(Yi(n)))+E{IWi(n)|2)£{ei(Yi(n))} 

(14)  F4(l/g(n),l/f(n))=l/g(n).M(Wi(n),Yi(n)) 

(13)  and  (14)  provide  a  second  equation  (15),  which 
depends  only  on  the  observations  Xi(n)  and  Yi(n),  exactly 
as  in  (6). 

(15)  F2(a(n).p(n))  =  a(n)p(n)  F2(l/p(n),  l/a(n)) 
(f(n).g(n))  and  (l/g(n),l/f(n))  are  necessarily  solutions  of 


(15). 

(15)  may  be  developed  as : 
K0(n)*  , , 

‘ 

K3(n) 


[l-a(n)*  P(n)*  ]  = 
[l-a(n)p(n)] 


As  in  section  3.1,  using  two  values  of  X,  we  deduce  from 
(15)  that  the  complex  solutions  a(n)  (including  f(n)  and 
l/g(n))  belong  to  a  circle  Cl  of  equation  (16). 

K0(n)  *  NKO(n)  K0(n)  NKO(n)  *  o 
^  K3(n)  NK3(n)  K3(n)  NK3(n) 

,,K0(n)*  NKO(n)*,  ,  ,  ,NK0(n)  K0(n),  ,  _  „ 

raw -ISw " 

Consequently,  the  values  of  l/a(n)  (including  l/f(n)  and 
g(n))  belong  to  a  straight  line  LI,  which  may  be  easily 
computed  from  Cl.  As  we  know  that  l/f(n)  and  g(n)  also 
belong  to  the  circle  CO,  the  expected  values  l/f(n)  and 
g(n)  ate  necessarily  the  two  intersections  between  LI  and 
CO. 


The  cancellation  of  the  four  non  linear  functions 
F(a(n),p(n))  provides  only  two  pairs  of  solutions 
(f(n),g(n))  and  (l/g(n),l/f(n)).  Indeed,  the  four  equations 


(G(a(n),P(n))=0)  can  be  combined  into  two  polynomials 
in  a(n)  and  two  polynomials  in  P(n)  such  as  : 

(17)  K0(n)  la(n)l^+Kl(n)a(n)-Kl(n)*a(n)*=0 
L0(n)*  a(n)  -  L0(n)  oc(n)*  +Ll(n)  =0 

We  deduce  from  (17)  that  Ae  complex  solutions  a(n) 
(respectively  P(n))  belong  to  a  circle  CO  and  a  straight  line 
LO  (respectively  Cl  and  LI),  We  verify  that  the  two 
intersections  between  CO  and  LO  (respectively  Cl  and  LI) 
are  f(n)  and  l/g(n)  (respectively  g(n),  l/f(n)). 

The  complex  coefficients  K0(n),  Kl(n),  L0(n)  and  Ll(n) 
depend  only  on  the  statistics  of  non  linear  functions  of  the 
observations  Xi(n)  and  Yi(n)  and  may  be  estimated  from 
data. 

3.3  Influence  of  the  noise  powers 

Note  however,  that  equations  (17)  are  verified  only  if 
noise  powers  are  not  negligible.  If  not : 

(18)  Fl(f(n).g(n))  =  Fl(l/g(n),l/f(n))  =  0 
F2(f(n),g(n))  =  F2(l/g(n),l/f(n))  =  0 

An  easier  solution  may  be  proposed  in  this  case : 

(19)  E{(Xi(n)  -  a(n)  Yi(n))  .  (Yi(n)  -  p(n)Xi(n))*  . 
eR(Xi(n))+I(Xi(n))) 

The  hypothesis  of  uncorrelation  between  sources  leads  to : 

(20)  E{(Xi(n)  -  cx(n)Yi(n)) .  (Yi(n)  -  P(n)  Xi(n))*}  =  0 
(19)  and  (20)  are  sufficient  to  solve  the  problem.  It  yields 
to  a  second  order  equation  in  |3(n)*  whose  solutions  are 
g(n)*  and  (l/f(n))*.  The  influence  of  the  noise  powers  is 
illustrated  with  simulation  results  in  §4. 

3.4  Determination  of  the  exponential  functions 

The  non  linear  functions  must  be  separated  into  a  product 
of  functions  of  Si(n),  Ri(n)  and  Vi(n).  It  will  allow  us  to 
eliminate  firstly  the  terms  containing  the  sources  Si(n), 
Ri(n),  and  secondly  the  terms  containing  the  noise  signals 
Vi(n),  Wi(n).  The  nonlinearity  is  necessarily  chosen  as  an 
exponential  complex  function,  and  e^(^i(”)). 

We  see  from  the  previous  computations  that  e^(^i(n)) 
must  be  separated  into  product  of  e^C^Kn))^  eH(Ri(n))^ 
in  order  to  obtain  a  direct  solution  of  the  complex  gains 
f(n)  and  g(n).  It  is  the  only  constraint  of  linearity  of  the 
function  H.  We  propose  in  the  paper  a  particular  function 
H,  another  one  is  proposed  in  [7]  :  R[Xi(n)]+I[Xi(n)] 
where  R  and  I  are  respectively  the  real  and  imaginary  part 
of  the  complex  Xi(n).  The  first  one  is  more  robust, 
regarding  the  choice  of  the  coefficients  X. 

4  Simulation  results 

We  verify  the  robustness  of  the  algorithm  against  non 
gaussian  noise.  The  observations  are  instantaneous 
mixtures  of  complex  time  sequences.  One  source  admits 
the  four  equiprobable  values  (1,  -1,  j,  -j).  The  second  one 
is  unsymmetrically  distributed,  with  zero  mean  and  unit 
variance:  it  is  a  QARl  process,  constructed  from  a 
Gaussian  noise.  The  noise  signals  are  uniformly 
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distributed  with  zero  mean  and  unit  variance.  The  complex 
coefficients  f  and  g  are  equal  to  (2,1)  and  (0.5,1). 

In  the  proposed  algorithm,  some  statistical  moments 
must  be  estimated,  in  order  to  compute  -the  quantities 
J0(n),  Jl(n),  NJ0(n),  NJl(n),  K0(n),  K3(n),  NKO(n),  and 
NK3(n).  For  each  sample  i,  we  estimated  these  moments, 
using  time-averages  in  a  recursive  way.  We  observe 
(figure  1),  the  error  between  f  of  value  (2,1)  and  its 

estimate  f  (i)  ,  If-  f  (i)l.  The  error  has  been  drawn  for  two 
different  noise  powers.  These  powers  have  berai  computed 
in  order  to  obtain  the  two  signal  to  noise  ratios,  OdB  ^d 
5dB,  between  noise  powCTS  and  source  powers.  We  notice 
that  the  convergence  speed  depends  on  the  signal  to  noise 
ratio,  which  seem  to  be  usual.  It  is  about  2000  samples 
for  5dB  and  3500  for  OdB. 


Figure  1 :  Estimation  error  of  f  in  function  of  time 
samples 

Secondly,  we  verify  the  robustness  of  the  algorithm 
against  noise.  For  that,  "asymptotic"  performances  are 
analyzed.  We  show  residual  errors,  computed  with  8192 

samples  :  If-  f(8192)l  for  several  signal  to  noise  ratios 
from  -8dB  to4dB. 


Signal  to  noise  ratio  (dB) 


Figure  2 :  Residual  error  in  function  of  signal  to  noise 
ratio 


We  notice  on  figure  2  that  the  algorithm  is  quite  robust 
against  noise  (low  residual  error  for  any  signal  to  noise 
ratio).  However,  performances  are  specially  good  for  a 
signd  to  noise  ratio  higher  than  0  dB. 

According  to  what  has  been  shown  previously  §3.3,  the 
method  is  available  if  the  noise  powers  are  not  negligible. 
For  the  proposed  example  in  figure  3,  the  method  remains 
available  up  to  a  lOdB  signal  to  noise  ratio.  For  a  signal 
to  noise  ratio  higher  than  lOdB,  the  residual  error  rapidly 
increases,  and  it  is  better  to  solve  equations  (19)  (20). 

5  Conclusion 

We  propose  a  method  of  source  separation,  taking  into 
account  additive  noises.  No  hypotheses  on  their 
probability  densities  are  made.  The  main  characteristic  of 
the  method  is  the  use  of  an  objective  function  based  on 
nonlinear  functions  (exponential).  Sj^ific  properties  of 
these  functions  and  the  hypothesis  of  independent  sources 
make  it  possible  to  write  an  analytic  solution  for  the 
estimation  of  the  two  linear  filters  that  combine  sources. 
The  convergence  speed  of  the  proposed  method  and  its 
robustness  against  noise  are  then  illustrated  with  several 
simulation  results. 
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Appendix 

Let  us  define : 

(lX31(a(n),P(n))  =  E{X’i(n) .  (Y’i(n))*  .  ej(Xi(n))) 

=E{  (Xi(n)  -  a(n)  Yi(n)) .  (Yi(n)  -  p(n)  Xi(n))*  .  eK^ifn))} 
where  *  represents  the  conjugate  of  a  complex  value. 

In  the  computation  of  Gl(f(n),g(n))  and  Gl((l/g(n),l/f(n)), 
we  notice  the  existence  of  some  terms  which  depend  only 
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on  sources  such  as  E{Si(n).ej(^*^”))}  and  E{Ri(n)''' 
ej(f(n).Ri(n))} 

Now  replace  the  observations  Xi(n),  Yi(n)  with  Xi(n), 


Yi(n),  where: 

(2) Xi(n)=Xi(n)- 

(3) Yi(n)=Yi(n)- 


EfXifn)eipQ(”))] 

E(ej(Xi(n))} 

EfYifnW(Xi(n))) 

E{ej(Xi(n))) 


Consequently,  a  new  model  for  source  separation  is  given 
by :  _  _  _ 

(4)  Xi(n)  =  Si(n)  +  f(n)  Ri(n)  +  Vi(n) 

(5)  Yi(n)  =  g(n)  Si(n)  +  Ri(n)  +  Wi(n) 

Signals  Si(n),  Xi(n)  and  Vi(n)  are  deduced  from  Si(n), 
Ri(n)  and  Vi(n)  by  subtraction  from  a  nonzero  value  such 
that  Si(n).eiC^H^)),  Ri(n)  .ei(^(”))  and  Vi(n)ei(^*W) 


have  zero  means.  The  noise  signal  Wi(n)  is  not  niodified 
while  Wi(n)  is  independent  of  Xi(n).  Thus  Wi(n)ej(^‘(*')) 
already  h^  a  zero  mean. 

In  order  to  eliminate  the  terms  containing  Si(n),  Ri(n),  in 
Gl(f(n),g(n))  and  Gl((l/g(n),l/f(n)).  we  propose  to  modify 
the  function  FI,  by  replacing  (Xi(n)-a(n)Yi(n))  with 
(Xi(n)-a(n)Yi(n)). 

Consider  now: 


(6) 

G2(a(n),p(n))=E{(Xi(n)  -  a(n)  Yi(n))  .  (Yi(n)  -  p(n) 
Xi(n))*.ei(Xi(n))} 

In  the  computation  of  G2(f(n),g(n))  and  G2((l/g(n),l/f(n)), 
Si(n)  and  Ri(n)  appear  in  the  terms  E{  Si(n).ei(Si(n)))  and 
E{  Ri(n)*ei(^(®)-^*(”))},  which  are  zero,  owing  to  the 
definition  of  Si(n)  and  Ri(n)  and  the  hypothesis  of 
independence  of  Si(n),  Ri(n),  Vi(n)  and  Wi^),  leaving : 

(7)  G2(f(n),  g(n))  =  -  g(n)*.  E  { Vi(n) Vi(n)*  . 

ej(Xi(n))} 

-  f(n) .  E{  IWi(n)|2}  .  E{ej(Xi(n))}^ 

(8) G2(l/g(n),l/f(n))=-l/f(n)*.  E{  Vi(n)Vi(n)*.ei(Xi(n))) 

-l/g(n)  .E{  IWi(n)|2}  .  E{ej(X‘(n))) 

Subsequently,  we  eliminate  the  terms  containing  Wi(n)  in 
(7),  (8).  For  that,  we  use  the  information  of  uncorrelation 
between  Si(n),  Ri(n),  Vi(n)  and  Wi(n),  which  provides : 

(9)  -E{(Xi(n)-f(n)Yi(n)).(Yi(n)-g(n)Xi(n))*)  =  f(n) 
E{IWi(n)|2)  +  g(n)*  E{IVi(n)|2} 

Now  substitute  E{IWi(n)|2)  from  (9)  to  (7)  and  (8)  which 
yields : 

(10) G2(f(n),g(n))-E{eiCXi(n))),E{[Xi(n)-f(n)Yi(n)].[Yi(n)- 
g(n)Xi(n)]*)  =  g(n)*.  L(Vi(n),Xi(n)) 

where : 

L(Vi(n),Xi(n))=-E{  Vi(n)Vi(n)*.ei(Xi(n))}+ 
E{IVi(n)|2}.E{ei(^i("))) 

(11) G2(l/g(n),l/f(n))-E{ei(Xi(n))}£{[Xi(n)- 
l/g(n)(n)Yi(n)].[Yi(n)-l/f(n)Xi(n)]*) 

=l/f(n)*  L(Vi(n),Xi(n)) 


As  a  result,  let  the  new  two-dimensional  function  G3,  be 
defined  by : 

(12) 

G3(a(n),P(n))=F2(a(n),p(n))-E{ej(Xi(n))).E{[Xi(n)- 

a(n)Yi^)].[Yi(n)-p(n)7a(n)]*} 

=E{  [Xi(n)  -  a(n)  Yi(n)]  .  [Yi(n)  -  p(n)  Xi(n)]*  . 
ej(Xi(n))} .  E{ej(Xi(n))} .  E{[Xi(n)  -  a(n)Yi(n)].  [Yi(n)  - 
p(n)Xi(n)]*} 

G3  may  also  be  written  as : 

(13)  G3(a(n),p(n))  =  J0(n)  +  Jl(n)  a(n)  +  J2(n) 

a(n)P(n)*  +  J3(n)p(n)* 


with : 

J(Kn)=E{Xi(n)Yi(n)*ei(Xi(n))}. 

E{ej(Xi(n))}  .E{xi(n)Yi(n)*} 

Jl(n)=  -E{IYi(n)l^ei(Xi(n)))  +  E{ejCXi(n)))£{iYi(n)l^} 

E(Yi(n)ei(Xi(n)))  EfYi(n)*eR(Xi(n))| 

+  E{eRj(Xi(n))}  ^lYKn)  e-v  v 


J2(n)  =  J0(n)  * 

J3(n)=-E{IXi(n)l^ej(Xi(n)))+E{ej(Xi(n))}£{|Xi(n)l^} 


We  observe  from  (9)  and  (10)  that : 

(W)  F3(f(n),  g(n))  . 

F3(l/g(n),l/f(n))  -IW 

Conseauentlv.  the  nonlinear  function  G3(a(n),P(n)) 


verifies: 

(15)G3(f(n),g(n))=g(n)*.f(n)*.[J0(n)+Jl(n).l/g(n)+ 

J2(n).l/g(n).l/f(n)*+J3(n).l/f(n)*] 

where  J0(n),  Jl(n),  J2(n)  and  J3(n)  depend  only  on  the 

observations  Xi(n)  and  Yi(n), 

A  similar  expression  may  be  written  for  G3(l/g(n),l/f(n)) 


(16) 

G3(l/g(n),l/f(n))=l/g(n)*.l/f(n)*.[J0(n)+Jl(n).f(n)+J2(n). 

f(n).g(n)*+J3(n).g(n)*] 

(13),  (14)  and  (15)  naturally  provide  a  nonlinear  equation 
whose  pairs  (f(n),g(n))  and  (l/g(n),l/f(n))  are  necessarily 
solutions  : 

(17)  G3(a(n),p(n))  =  a(n)*p(n)*G3(l/p(n),l/a(n)) 

It  may  be  developed  as : 


L  «(")  . 


[l  -  a(n)  *  p(n)  *]  = 


L  Jl(n) 

We  obtain  then  the  equation  (15). 


[l-a(n)p(n)] 


325 


Blind  Separation  of  Convolutive  Mixtures:  A  Gauss-Newton  Algorithm 


Sergio  Cruces 

TSC  Group,  ESI  Telecomunicacion 
Universidad  de  Sevilla 
Av.  Reina  Mercedes,  41012-Sevilla,  Spain 
sergio@viento.us.es 


Abstract 

This  paper  addresses  the  blind  separation  of  convolu¬ 
tive  mixtures  of  independent  and  non-Gaussian  sources.  We 
present  a  block-based  Gauss-Newton  algorithm  which  is 
able  to  obtain  a  separation  solution  using  only  a  specific 
set  of  output  cross-cumulants  and  the  hypothesis  of soft  mix¬ 
tures.  The  order  of  the  cross-cumulants  is  chosen  to  obtain 
a  particular  form  of  the  Jacobian  matrix  that  ensures  con¬ 
vergence  and  reduces  computational  burden.  The  method 
can  be  seen  as  an  extension  and  improvement  of  the  Van- 
Gerven’s  symmetric  adaptive  decorrelation  (SAD)  method. 
Moreover,  the  convergence  analysis  presented  in  the  paper 
provides  a  theoretical  background  to  derive  an  improved 
version  of  the  Nguyen- Jutten  algorithm. 


1.  Introduction 

Blind  source  separation  is  receiving  a  growing  interest 
due  to  its  applications  in  diverse  fields  such  as  array  pro¬ 
cessing,  multiuser  communications,  etc  ([10]-[8]).  While 
most  of  the  work  has  been  developed  in  the  context  of  in¬ 
stantaneous  mixtures,  the  more  difficult  problem  of  convo¬ 
lutive  mixtures  has  received  less  attention.  Recently,  Van 
Gerven  et  al.  [15]  have  proposed  and  analyzed  the  Sym¬ 
metric  Adaptive  Decorrelation  (SAD)  algorithm  that  only 
uses  second  order  statistics  and  exhibits  interesting  signal 
separation  properties.  However,  it  is  well  known  that  meth¬ 
ods  based  on  second  order  statistics  are  not  sufficient  to 
solve  the  blind  source  separation  problem  and  higher  order 
statistics  are  necessary.  For  this  reason,  Nguyen  and  Jutten 
[12]  have  proposed  an  algorithm  (NJ)  that  cancels  fourth 
order  cross-cumulants  of  the  output  signals  to  achieve  sepa¬ 
ration.  Nevertheless,  the  convergence  of  this  algorithm  has 
not  been  analyzed  yet. 

•This  work  has  been  supported  by  the  National  Research  Plan  of  Spain, 
CICYT  (Grant  No.  TIC96-05(X)-C10-08)  and  Xunta  de  Galicia  (Grant  No. 
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In  this  paper  we  present  a  method  that  generalizes  and 
improves  the  SAD  algorithm  using  higher  order  cross- 
cumulants.  From  the  given  method,  it  also  can  be  derived  an 
algorithm  which  is  similar  the  NJ  algorithm  for  two  sources 
except  for  a  term  that  ensures  asymptotic  stability. 

2.  Signal  model 

Let  us  assume  the  signal  model  presented  in  Fig¬ 
ure  1.  There  are  N  independent  sources  s[n]  = 
[si[n], . . . ,  SAr[n]]^  where  at  most  one  is  Gaussian.  The 
Darmois-Skitovich  theorem  [3]  ensures  that  the  sources  can 
be  separated  by  imposing  pairwise  independence  between 
them.  The  sources  are  convolutively  mixed  through  a  mul¬ 
tichannel  linear  time-invariant  system,  with  memory,  to  ob¬ 
tain  the  observations  vector  x[n]  =  [xi[n], . . . ,  xmH]  . 
where  M  >  N.  The  impulse  response  of  the  mixing  sys¬ 
tem  is  characterized  by  the  sequence  of  M  x  N  matrices 
A[n]  =  [aij[n]].  We  will  assume  that  the  mixing  sys¬ 
tem  is  FIR  and  (possibly)  non-causal.  As  a  consequence, 
A[n]  5^  0  for  -Lai  <  n  <  Lo2,  and  the  relationship  be¬ 
tween  sources  and  observations  can  be  written  as 

x[n]  =  A[n]  *  s[n]  =  ^  A[A:]s[n  -  k] 

k=-Lal 

In  order  separate  the  sources,  the  observations  are  processed 
by  another  multichannel  LTI  system,  with  memory,  charac¬ 
terized  by  its  M  X  N  impulse  response  sequence  matrix 
B[n].  Again,  the  separating  system  will  be  FIR,  (possibly) 
non  causal  and  non-zero  in  the  interval  —Lti  <  n  <  Li2‘ 
Therefore,  the  output  vector  y[n]  =  [yi [n], . . . ,  VnM]  is 
given  by 

y[n]  =  B[n]  *  x[n]  =  ^  B[fc]x[n  -  k] 

k=—Lhi 

There  are  several  indeterminacies  in  the  source  separa¬ 
tion  problem  [14].  Our  approach  can  handle  the  sources 
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Figure  1.  Feed-forward  system  model. 


or  approximable  by  setting  a  bit  longer  the  separation  filters 
length. 

3.  Statistical  dependence  measure 


ordering  indeterminacy,  but  we  must  avoid  the  scaling  and 
delay  indeterminacy.  Towards  this  aim,  we  will  suppose 
that  the  diagonal  terms  of  the  separation  sequence  matrices 
B[fc]  and  that  of  A[A;]P  are  equal  to  the  unit  impulse  (J[fc] 
(P  is  an  unknown  permutation  matrix). 

Once  the  signals  y[n]  are  separated,  it  is  necessary  to  in¬ 
troduce  a  post-processing  system  D(z)  in  order  to  remove 
the  undesired  correlations  originated  by  the  separation  pro¬ 
cess.  For  this  task,  it  can  be  shown  [5]  that  the  D(2r)  transfer 
function  should  be  diagonal  with  elements  given  by 


Dii{z)  = 


Adjii{B{z)) 

Det[B{z)] 


(1) 


where  Adjii{')  is  the  adjoint  operator  on  the  (i,i)  matrix 
element.  At  the  output  of  the  post-processing  system  we 
obtain  the  vector  s[n]  of  estimated  sources. 

The  following  hypothesis  will  be  needed  for  the  algo¬ 
rithm  to  perform  adequately 


1.  The  mixture  is  soft  in  the  sense  that  each  observed 
signal  receives  a  dominant  contribution  from  one  of 
the  sources.  This  typically  occurs  when  each  sensor 
is  closer  to  a  different  source  and  these  have  similar 
power. 


2.  There  is  a  subset  of  cumulants  of  the  sources  in  which 
the  cumulants  of  the  same  order  do  not  have  great  dif¬ 
ferences  in  their  magnitude.  This  is  obviously  true 
when  the  sources  are  identically  distributed. 

The  soft  mixture  assumption  is  appropiate  for  the  separa¬ 
tion  case  of  convolutive  mixtures,  since  a  sufficient  condi¬ 
tion  for  the  stability  of  the  post-processing  filters  also  de¬ 
pends  on  this  hypothesis.  It  can  also  be  shown  that  the 
soft  mixtures  assumption  will  lead  to  a  D(0)  that  is  not  far 
from  the  identity  matrix,  fact  that  will  be  important  in  later 
demonstrations. 

The  separation  filter  length  Lt  =  L^i  -h  Lh2  H-  1  is 
chosen  to  match  the  length  of  the  filters  in  the  numerator 
of  the  inverse  multichannel  transfer  function.  Therefore 
^61  =  Lai{N  -  1)  and  1^2  =  La2{N  --  1)  if  M  =  iV, 
and  Lti  =  SLaiiN  —  1)  and  Lb2  =  3La2{N  ~  1)  when 
M  >  N,  The  inverse  of  a  FIR  multichannel  transfer  func¬ 
tion  has  some  HR  components,  these  components  will  be 
provided  by  B{z)  in  the  two  sources  case.  Nevertheless, 
in  the  multiple  sources  case,  our  approach  will  be  only  ap¬ 
proximate.  We  can’t  reach  exactly  the  solution  but,  under 
the  soft  mixture  hypothesis,  the  UR  part  will  be  negligible 


We  define  the  cumulant  tensor  .  •  •  iYW) 

as  the  tensor  where  the  el- 

ement  is  . = 

Cum{yi^ [n]  X  ai, . . . ,  j/j, [n]  x  a;)  the  (ai, . . . , a/)-order 
cross-cumulant  between  the  signals  [n], . . . ,  [n]. 

We  propose  that  the  adjustable  elements  of  the  separat¬ 
ing  matrix  B{z)  be  selected  to  jointly  diagonalize  the  set 
of  cumulant  matrices  Ca,/3(y[n],y[n  -  k])  for  the  differ¬ 
ent  lags  k  =  -Lfci, . . . , Lb2  and  for  the  set  Cl  of  uq  pairs 
(a,  /3)  of  natural  numbers.  Towards  this  aim,  we  propose  to 
minimize  the  following  cost  function 

(2) 

{a,l3)en 

N  N 

i=l  3=1  k=-Lhi 

The  are  a  set  of  weighting  coefficients.  Note  that 
can  be  interpreted  as  a  statistical  dependence  measure  of 
the  output  vector  y[n].  Initially,  we  will  consider  the  set 
Cl  containing  all  the  possible  pairs  {a,  /3),  However,  it  will 
be  shown  that  the  minimization  of  the  subset  (l,/3)  is  suf¬ 
ficient  to  ensure  separation  under  the  soft  mixture  assump¬ 
tion.  Trivial  non-separation  solutions,  where  any  of  the  out¬ 
puts  is  equal  to  zero,  occur  when  the  mixing  or  separating 
matrices  are  not  full  row  rank.  The  convergence  to  these 
solutions  is  avoided  by  the  soft  mixture  hypothesis  since, 
when  this  hypothesis  holds,  we  found  to  be  on  the  basin  of 
attraction  of  the  correct  separation  solution. 

4.  Minimization  algorithm 

For  simplicity  reasons,  we  will  rewrite  the  dependence 
measure  in  a  vector  form.  Let  us  define  the  vec¬ 
tor  =  {Ca,/3(yi[n],2/j[n  -  A:]);  = 

1 . .  .  JV,  k  =  —Ltir  *  ’ , L62}-  If  we  now  define  the  vec- 

tor  z  =  , . . . ,  with 

(a,  /3)i  E  n,  the  dependence  measure  can  be  written  as  the 
inner  product  =  z^z.  It  is  also  convenient  to  rearrange 
the  separation  variables  in  a  vector  b  =  {bij[k];  i  = 

1 . .  . . ,  iV j  j  \j^i  =  1 , . . . ,  Af  j  k  =  —Lfti , . . .  5  L/b2  }  ■ 
Different  algorithms  can  be  used  to  minimize  Due 

to  the  soft  mixture  hypothesis  we  are  initially  close  to 
the  separation  solution  and  the  vector  z  of  output  cross- 
cumulants  is  small  (with  norm  close  to  zero).  In  this  sit¬ 
uation,  we  can  use  the  Gauss-Newton  (GN)  method  [6]  to 
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find  the  desired  minimum,  because  the  Hessian  matrix  of 
the  adaptation  can  be  well  approximated  from  the  Jacobian 
matrix.  This  reason  and  the  quadratic  convergence  of  the 
GN  method  reduces  the  computational  burden. 

The  Gauss-Newton  method  uses  the  gradient  and  the 
Hessian  of  with  respect  to  the  separation  coeffi¬ 
cients.  The  gradient  is  given  by:  Vb^n  =  2Jz  where 

T  —  \M.  . . .  .B, — 2* - 1^  is  the  Jacobian  matrix.  The 

J  -  tail’  > 

global  Jacobian  matrix  can  also  be  defined  in  terms  of 
the  sub- Jacobian  matrices  J(a,;3)  =  VbZ(a,;8)>  J  = 

lution  the  Hessian  can  be  approximated  by  H  w  2JJ  . 
Then,  the  Gauss-Newton  method  consists  in  the  following 
iteration 

bl„+i=  b|„  -t-  jtA  ,  H  •  A  =  -2Jz  (4) 

where  ft  €  (0, 2)  is  the  step-size.  Denoting  by  3'^*  the 
pseudo-inverse  of  J^,  the  increment  A  is  calculated  as  A  = 
-  J^#z  or,  in  an  equivalent  way,  by  solving  the  following 
system  of  linear  equations 

i=l  /  i-1 

4.1.  Jacobian  structure 

In  this  section,  we  will  choose  an  adequate  set  Q  to  re¬ 
duce  the  computational  burden  of  the  algorithm  (4).  We 
will  exploit  the  Jacobian  structure  for  this  selection,  show¬ 
ing  that  some  few  elements  of  the  Jacobian  matrix  has  a 
strong  dominance  over  the  rest.  This  enables  to  approxi¬ 
mate  the  Jacobian  matrix  by  an  strong  sparse  matrix.  As 
a  consequence,  the  computational  burden  of  the  algorithm 
will  be  reduced  for  two  reasons:  there  are  less  cumulants 
to  estimate  and  there  exist  efficient  methods  to  solve  linear 
sparse  systems  [7]. 

Each  sub-Jacobian  is  a  LbM{N  —  l)-by- 

LbN{N  -  1)  rectangular  matrix.  Its  elements  are  propor¬ 
tional  to 

dCa,0{yi[n],yj[n  -  k])  ^  ^5) 

dhrs 

=  Sir  a  J/iM.  yj[”  “  *])  + 

Sjr  0  Ci,a,0-iixs[n  -k-  m],yi[n],yj[n  -  A:]) 

where  =  {1  if  *  =  »*>  0  else},  and  where  the  index 
{i,j,k,r,s,m)  belong  to  the  range  R  =  = 

l,...,iNr;  m,k  = 

The  resulting  expression  shows  that,  for  (a,  0)  both  greater 
than  1  and  under  the  assumption  of  soft  mixtures,  all  the 
Jacobian  elements  are  close  to  zero,  since  the  signals  j/i[n] 
and  yj[n]  will  have  an  small  and  decreasing  dependence  be¬ 
tween  each  other.  Therefore,  has  a  slow  variation  or 


sensitivity  with  respect  to  the  separation  coefficients,  and 
this  makes  this  kind  of  pairs  undesirable  to  use  in  a  gradient 
based  algorithm.  When  a  or  /3  are  equal  to  1,  the  Jacobian  is 
no  longer  close  to  zero,  therefore  avoiding  the  former  prob¬ 
lem.  We  will  set  a  =  1  since  this  choice  leads  to  a  well 
behaved  structure  for  the  Jacobian  matrix  while  0  =  \  does 
not.  Substituting  a  =  1  in  expression  (5)  and  neglecting 
the  non-dominant  terms  around  the  separation  solution,  we 
arrive  at  this  first  approximation  for  the  Jacobian  elements 

dCiAyM^y^  - M  ^  SirCi,p{xs[n  -  m],yj[n  -  k]) 
UOrs[P^\ 

The  above  expression  is  not  true  for  the  case  ^  =  1,  but 
even  in  that  case  constitutes  a  valid  approximation  for  the 
Jacobian. 

When  the  sources  can  be  considered  locally  stationary 
with  respect  to  the  delay  Lb  (e.g.  voice  signals)  or  when 
we  work  with  a  block  based  method  of  Ud  samples  where 
nd»  Lb,  Hit  cross-cumulant  estimates  can  be  considered 
stationary.  This  simplifies  the  last  approximation  to 

dC 1  {yi [n] jyjjTi  —  k])  ^ 
dh-pg  [m] 

SirCi^pixsin  -  m  -h  fc],  yj[n])  if  k<m 
SirCiy{xs[n],yj[n  -k-\-m])  if  k>7n 

The  Jacobian  can  now  be  computed  from  nQ2LbM {N  - 
1)  cumulant  estimates.  We  see  that  the  structure  of  the 
sub-Jacobians  3{a,f3)  is  sparse,  they  are  block  toeplitz  with 
blocks  of  dimension  M{N  —  1),  and  each  block  is  block 
diagonal  formed  by  sub-blocks  of  dimension  (AT  -  1). 

When  the  sources  are  white  with  respect  to  its  cumulants 
of  order  1  -f  or  when  the  mixture  is  instantaneous,  the 
approximation  (6)  can  be  further  simplified  in  the  following 
way 

dCtAyMyMzM)..  « SirS^mCiM^sln], yM)  (7) 
dbrsim] 

In  this  case,  the  Jacobian  precises  only  nQM{N  -  1)  cu¬ 
mulant  computations.  The  sub-Jacobian  matrices  3 (a, (3)  are 
block  diagonal  with  blocks  of  dimension  N  -  1.  The  pro¬ 
posed  algorithm  for  the  two  white  sources  case  reduces  to 
the  coefficients  adaptation: 

bij[k]\n-\-l  = 

12(i,0)€Q^0  gi./3(a:j[n],y,[n])  Ci,0iyi[n],yj[n  -  A:]) 
52(1  ,;3)  en  ^1,0  ^ 

l  j  j\j^i  —  lj..*j2,A:  Rbl  5  *  ■  •  j  Rb2 

Recall  that  this  iteration  only  requires  2nci{Lb  +  1)  cumu- 
•lants! 
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Figure  2.  Feed-backward  system  model. 


5.  Link  with  another  separation  methods 


For  the  moment,  we  have  used  the  feed-forward  (FF) 
separation  structure  (Figure  1).  Another  possibility  is  the 
feed-backward  (FB)  filtering  structure  that  can  be  seen  in 
Figure  2.  This  structure  has  the  advantage  of  avoiding  post¬ 
filtering  by  a  post-processing  matrix,  but  its  analysis  is  not 
simple  due  to  the  feedback.  In  this  structure,  the  transfer 
function  of  the  separation  filter  is  (I  +  C(2f))“^  where  C{z) 
is  a  matrix  with  the  same  dimensions  as  A{z)  and  zero  diag¬ 
onal  elements.  We  can  establish  a  connection  between  our 
feed-forward  algorithm  and  a  similar  feed-backward  one 
under  the  assumption  of  soft  mixtures. 

The  relation  between  the  feed-backward  and  the  feed¬ 
forward  structure  is  given  by  'D{z)B{z)  =  (I  +  C{z)Y^, 
For  the  two  sources  case,  the  solution  of  this  equation  re¬ 
duces  to  C(2r)  =  I  —  B{z),  This  establishes  the  con¬ 
nection  between  both  filtering  approaches.  For  more  than 
two  sources  and  soft  mixtures,  we  may  approximate  (I  -h 
C(z))^^  by  the  linear  part  of  its  Taylor  series  expansion 
around  C{z)  =  0  which  is  (I  H-  C(z))“^  «  I  -  C(^). 
Since  for  the  soft  mixtures  hypothesis  we  can  approximate 
the  post-processing  matrix  by  the  identity  T>{z)  «  I,  the 
connection  between  both  filtering  approaches  still  takes  the 
form  C{z)  »  I  —  B(2;).  Having  this  in  mind,  we  can  pro¬ 
pose  a  feed-backward  separation  algorithm  simply  updating 
the  coefficients  Cij  [k]  in  the  same  way  the  feed-forward  al¬ 
gorithm  does,  but  changing  the  adaptation  sign  and  replac¬ 
ing  y[n]  by  s[n].  The  algorithm  equivalent  to  (8)  in  the 
feed-backward  case  is 


WU+i  “  H" 

T,(i,0)ea^0  C'i,ff(xj[n],8j[n])  gi,ff(si[n],s,[n  -  k]) 

—  1)  •  •  • )  2)  fc  =  Lai,  •  ■  . ,  La2  (9) 

This  algorithm  will  also  be  asymptotically  stable  for  /?  > 

1  since  it  converges  to  the  Gauss-Newton  algorithm  when 
approaching  to  the  separation  solution.  When  no  =  1  it 
simplifies  to 


Cij[Aj]|n  +  fl 


-  k]) 


—  Ij  •  •  . ,  2;  A:  —  —Lai,  ,  ,  , ,  La2 


It  can  seen  that  when  j3  =  1  this  algorithm  reduces  to  the 
SAD  algorithm  [15].  For  ^  =  3  it  takes  a  similar,  but  im¬ 
proved,  form  of  the  NJ  algorithm  [12].  The  improvement 


over  this  last  algorithm  is  three-fold:  the  proposed  algo¬ 
rithm  has  a  greater  speed  of  convergence,  it  works  with  non 
causal  mixtures,  and  it  is  asymptotically  stable  under  the 
given  hypothesis  while  NJ  is  not  stable. 

6  Separation  example 

Before  beginning  with  the  separation  example  we  will 
introduce  some  additional  definitions.  The  combined  mul¬ 
tichannel  impulse  response  sequence  of  mixing  and  sepa¬ 
rating  systems  will  be  denoted  by  H[n]  =  B[n]  *  A[n]. 
Let  us  define  the  matrix  He  =  whose 

elements  are  the  energy  of  H[n].  We  normalize  the  energy 
matrix  in  the  following  way:  He  =  |(D;:^He+HeD7^) 
where  Dr  and  Dc  are  diagonal  matrices  whose  diagonal  el¬ 
ements  are  the  maxima  of  the  rows  and  columns  of  He, 
respectively.  This  performance  index  is  similar  to  the  one 
used  in  [11].  The  normalized  matrix  He  is  invariant  to  sca- 
lation  by  diagonal  matrices  and  time  shifts  of  the  transfer 
functions.  Its  maxima  coefficients  by  columns  or  rows  are 
equal  to  1,  and  will  be  associated  with  the  normalized  en¬ 
ergy  of  the  direct  transfer  functions.  The  rest  will  be  asso¬ 
ciated  to  the  normalized  energy  of  the  interference  transfer 
functions. 

Let  us  suppose  we  have  two  independent  white  source 
signals  s[n]  (of  5000  samples)  which  are  mixed  through  the 
channel  A{z)  to  obtain  a  convolutive  mixture  y[n].  The 
mixing  matrix  is  formed  by  five  taps  FIR  filters,  and  verify 
the  soft  mixture  condition.  The  initial  filters  energy  ma¬ 
trix  is  He  =  [  1  ,  0.21  ;  0.98  ,  1  ].  We  run  a  few  iter¬ 
ations  of  the  feed-forward  separation  algorithm  presented 
in  (8)  to  separate  sources.  We  will  use  the  set  of  cumu- 
lants  n  =  {(1,3)}  for  the  dependence  measure.  At  each 
iteration  we  show  in  Figure  3  and  Figure  4  the  evolution 
and  performance  of  the  algorithm.  In  Figure  3  we  show 
how  the  interference  energy  is  reduced  whereas  the  direct 
path  signals  energy  is  preserved.  The  final  energy  matrix  is 
He  =  [  1 ,  0.0009  ;  0.0009 , 1  ]  which  is  close  to  the  iden¬ 
tity  matrix.  The  algorithm  does  not  converge  exactly  to  the 
identity  matrix  when  using  cross-cumulant  estimates  due  to 
the  finite  length  of  the  data.  If  we  have  the  exact  cross- 
cumulants  at  each  iteration  the  convergence  to  the  separa¬ 
tion  solution  is  achieved  with  an  arbitrary  precision.  Figure 
4  presents  the  minimization  of  the  dependence  measure.  It 
can  be  seen  that  the  theoretical  convergence  of  the  algorithm 
is  quadratic,  and  therefore,  reaches  the  separation  solution 
in  a  few  iterations. 

7  Conclusions 

A  new  signal  separation  algorithm  is  presented  for 
the  convolutive  mixture  of  independent  and  non-Gaussian 
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Figure  3.  Coefficients  of  the  filter  energy  ma¬ 
trix  He  vs.  iterations  when  using:  ’-o- 
’  cross-cumulants  estimates,  -x-  exact 
cross-cumulants. 


sources.  It  was  shown  that  it  generalizes  and  improves  a 
class  of  convolutive  separation  algorithms  that  have  been 
previously  proposed,  giving  then  a  theoretical  justification 
for  their  convergence  in  the  soft  mixtures  case.  The  al¬ 
gorithm  was  obtained  for  feed-forward  and  feed-backward 
structures,  and  has  the  following  advantages:  it  does  not 
depend  on  the  signals  cumulants  sign  to  converge,  it  has  a 
fast  convergence  since  it  is  a  Gauss-Newton  method,  and  it 
can  work  with  source  signals  of  arbitrary  probability  den¬ 
sity  functions  which  may  have  a  set  of  non-zero  cumulants, 
while  the  moment  based  algorithms  are  limited  to  signals 
of  even  pdfs.  Finally,  it  is  robust  in  the  sense  that  it  still 
allows  source  separation  when  the  signals  cumulants  of  a 
certain  order  are  zero  provided  that  Q  contains  other  cumu¬ 
lants  orders  which  do  not  vanish,  e.g.  this  allows  to  achieve 
separation  when  there  is  one  Gaussian  signal  if  (1, 1)  €  H. 
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Abstract 

The  paper  deals  with  the  variance  analysis  estimating 
the  third-order  cumulant  of  the  MA  system  output  signal 
which  was  excited  with  the  Bernoulli  distributed  white  noise 
on  the  input  These  scheme  was  used  to  simulate  the  su¬ 
perimposed  systems,  or  signals,  which  often  arise  in  the 
practical  approaches  (EMG  signal  decomposition).  The 
equations  were  derived  which  show  dependencies  of  the 
variances  of  the  third-order  output  cumulant  values  on  the 
lenght  of  the  output  signal  and  on  the  values  of  the  samples 
in  the  impulse  response  of  the  unknown  system  as  well.  The 
structure  of  the  variance  inside  the  third-order  cumulant  ar¬ 
ray  was  observed.  The  speed  and  the  convergence  of  the 
identification  procedure  could  be  predicted  using  these  re¬ 
sults  for  the  case  of  superimposed  MA  systems.  The  knowl¬ 
edge  of  the  structure  of  the  variance  is  helpfull  in  designing 
new  identification  methods.  Some  examples  are  given  to 
show  the  correctness  of  the  derived  equations. 


1.  Introduction 

Despite  the  variaty  of  different  methods  based  on  cumu¬ 
lants  which  are  theoretically  correct  [1,  2],  it  could  be  no¬ 
ticed  in  computer  simulations  that  the  results  are  strongly 
dependent  on  the  values  in  the  impulse  response  of  the 
uknown  system  being  identified.  That  effect  could  be  noted 
with  all  kind  of  system,  i.e.  minimum,  maximum  or  mixed 
phase.  Besides,  two  very  similar  MA  systems  could  give 
much  different  quality  of  the  identification  results,  if  the 
variance  of  the  result  is  the  measure  of  the  quality. 

On  the  other  hand,  higher-order  statistical  methods  have 
proved  successful  in  decomposition  of  superimposed  sig¬ 
nals  modelled  as  outputs  of  multichannel  systems  with 
Bernoulli  input  excitation  [3,  4].  To  connect  the  depen¬ 
dencies  of  the  results  on  the  type  of  the  impulse  response 
used  and  on  the  type  of  the  superimposed  MA  systems  ob¬ 
served,  the  paper  develops  an  analytical  variance  model  of 


the  third-order  cumulants,  whose  outcomes  are  further  used 
in  evaluation  of  a  possible  signal  decomposition. 

In  Section  2,  the  variance  for  the  third-order  cumulant 
is  derived,  with  the  Bernoulli  distributed  input  white  noise. 
In  Section  3,  some  simulation  results  are  described  and  the 
structure  of  variances  in  the  third-order  cumulant  array  is 
observed.  The  conclusions  are  gathered  in  Section  4. 

2.  Variance  analysis  for  the  third-order  cumu¬ 
lant 

An  MA(q)  system  with  the  Bernoulli  distributed  input  is 
considered: 


where  p  is  the  probability  of  the  positive  pulse  in  the  input 
signal. 

The  probability  function  for  the  output  signal  y{n)  of  the 
MA  system  could  be  written  as: 


(2) 


where  the  expression  in  (2)  denotes  the  number  of  ones 

and  the  expression  fco,t  the  number  of  zeros  in  the  binary 
presentation  of  i.  Exactly  2^“*"^  different  values  could  be 
obtained  in  the  output  signal,  because  each  sample  in  the 
system  impulse  response  is  multiplied  either  by  the  value  of 
-p  or  by  the  value  of  1  -  p  in  the  Bernoulli  distributed  input 
noise.  The  values  in  the  output  signal  are  computed  as: 

Yi  =  j2h{ij)-pj2hu),  (3) 

3=1  j=0 

where  values  h{ij)  represent  those  samples  in  the  impulse 
response  which  were  multiplied  by  the  positive  pulse  1-p 
in  the  input  noise.  That  are  the  samples  in  the  impulse  re¬ 
sponse  whose  corresponding  bit  in  the  binary  representation 
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of  i  is  set  to  1.  Equally  to  the  input  noise  the  output  signal 
is  zero  mean. 

The  third-order  cumulant  (ri ,  r2)  of  the  output  sig¬ 
nal  y{n)  for  an  MA  system  of  order  q,  which  was  excited 
with  non-Gaussian  distributed  white  noise,  was  derived  in 
[5]  as 

Q 

C3,y{ri,T2)  =  J3,w  h{n)h{n  +  Ti)h{n  +  tj).  (4) 

n=0 

In  simulations,  the  third-order  cumulant  value 
C'3,y(ri,r2)  is  estimated  from  N  samples  of  the  out¬ 
put  signal  y{n)  as  follows: 

1  ^ 

C3,y{Tl,T2)  =  ^X)j/(n)3/(?i  +  n)y(Ti  +  'r2),  (5) 


where  It:  =  iV-max(ri,r2).  Denote  partial  products  in  (5) 
with  random  variable  For  the  case  of  the  superim¬ 

posed  signals,  each  of  them  depends  on  the  system  response 
h{n)  and  q  4-  max(ri,T2)  +  1  subsequent  samples  of  the 
Bernoulli  distributed  input  noise.  Together, 
different  values  could  be  obtained  for  one  partial  product. 

Based  on  the  structure  of  the  superimposed  signal  y{n) 
Eq.  (2)  and  using  the  calculation  scheme  for  the  third-order 
cumulant,  the  following  probability  distribution  function  re¬ 
sults  for  partial  products  y^^  from  Eq.  (5): 


Xo 

Xi 

(1  — 

^  p^k2,i 

... 

... 

pg+max(ri,r2)+l 

The  values  of  the  random  variable  GJ  y^^  are  indexed 
from  Xo  to  X2,+m»*(ri,r2)+i .  The  relation  between  the  val¬ 
ues  Xi  of  random  variable  and  the  values  of  random 

variable  Y  is  the  following: 

~  ^{i  mod  2«+l)^((i  mod  2«+i+’'l)  2’’1) 

^((i  mod  2«+l+’'2)  2’'2)> 

where  the  operator  mod  means  the  remainder  of  the  integer 
division  and  the  operator  -f-  stands  for  integer  division. 

For  an  arbitrary  lag  (n,  r2)  in  the  third-order  cumulant, 
N  such  partial  products  are  summed  up  and  averaged  giving 
the  following  variance  of  Gs.y (n ,  T2): 

1  ^  2 

VarN  =  -  "i) 

i=l  ’ 

i=l  * 


-f 


wf  t  {Xi  -  m){Xj  -  Tn)^p{Y,Xi) 

■'''  i=i  j=i+i  ‘ 


+ 


AT-l  N 


1=1  J=l+1 

N-1  N 
i=i  j=i+i 

N{N-1)_2 

- r - Vfl 


+  E  2r,x,p(E^.) 


N 

N  ^  ' 


i=l  i=i+l 


where  Vavi  means: 


1  ^ 

Vari  =  ^Y.^Xi-mfp{Y,x,). 
i=i  ' 


(8) 


(9) 


The  second  part  of  the  Eq.  (8)  is  marked  as  Covi  and 
written: 

=  iliE 

2=1  j=i+l 

i=i  j=i+i 


N2 

2 


N-1 


N 


-m 


N-l 


i=l 

N-l 


=  4  E  E 

j=i+i 


N 


-m 


(10) 


The  cumulant  value  G3,y(ri,  r2)  is  obtained  as  a  sum  of 
arbitrary  number  of  random  variables  .  But  the  values 
Xi  in  this  random  variable  are  dependent:  when  one  partial 
product  has  the  value  of  Xi  with  the  probability  p{Xi),  its 
successor  has  either  the  value  of  X(j^2)-i-2’+'"“(’’i’’'2)  with 
the  probability  p(A’j)p  or  the  value  of  X^i^2)  with  the  prob¬ 
ability  p(Xi)(l  -p): 


X 


(n) 


X 


(n) 


y  (n+l) 
^(i-^2)+2’- 
y(n+l) 
^(i^2)  • 


or 


(11) 
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system 

order  q 

phase 

impulse  response 

1 

4 

mixed 

hl=[l  -2  3  1.5  -0.8] 

2 

4 

minimum 

h2=[l  1.33120.32720.23520.2568] 

3 

1 

maximum 

h3=[l  4] 

4 

1 

maximum 

h4=[l  -4] 

Table  1.  The  impulse  responses  of  the  tested  MA  systems 


In  Eq.  (1 1)  the  value  r  =  q  +  max(ri ,  T2)  was  used.  Let’s 
mark  =  m  and  compute  the  average  of  partial 

sums,  when  any  two  partial  products  are  summed  up: 


7712  = 


El 

*=0 


Xj  + 


]p{Xi)p 


2’’+!  j 

=  E 


2»*+i-i 

+  ^2  X{iJ:r2)-\-2^p{Xi)p 

i=0 

2r+l_j 

+  2  X(i.2)p(Xi)(l-p)] 


i=0 


2”-! 


1  ^ 

- fm  +  Xi\piX2i)  +  p(X2i+i)](l  - P) 

i=0  '  ' 


PiXi) 


a'+'-i 


+  22  [P(-^2i-2'-+0 +P(-^2»-2’-+l+l)]P  = 

■■  ' - - 


1=2’’ 


-(TTl  +  m)  =  771. 


(12) 


Let’s  prove  that  the  underbraced  expressions  in  Eq.  (12) 
equal  p{Xi).  But  first  we  have  to  prove  the  following: 


left).  Considering  Eqs.  (13)  and  (14),  we  proved  that  the 
first  underbrace  in  Eq.  (12)  equals  to  p{Xi).  Let’s  prove 
the  same  for  the  second  underbrace.  Because  in  the  binary 
representation  of  the  number  2i  -f  1  we  have  one  1  more  as 
in  the  binary  representation  of  the  number  2i  —  +  1,  if 

i  >  2^*,  the  following  auxiliary  result  could  be  written: 

P(-X’2i~2’'+i+l)  =  (15) 

If  p{X2i^i)  is  expressed  from  Eq.  (13)  and  put  into  Eq. 
(15),  the  value  of  the  second  underbrace  in  Eq.  (12)  is 
proven  as  well. 

With  derivation  (12)  it  was  shown  that  the  average  of  par¬ 
tial  sums,  when  summing  up  two  partial  products,  equals 
the  average  of  random  variable  ,  which  gives  the  pos¬ 
sible  values  of  the  partial  products  calculating  the  cumulant 
value  Cs^yin ,  T2).  Following  the  same  procedure,  it  could 
be  proven  that  this  average  remains  always  the  same,  sum¬ 
ming  up  N  partial  products,  and  equals  ttijv  =  m. 

But  the  main  question  is  how  the  variance  is  changing? 
Our  starting  point  is  the  variance  of  the  random  variable, 
Vari  =  which  represents  the  values  of  partial 

products.  How  the  variance  VarN  is  changing  when,  cal¬ 
culating  the  cumulant  value  Cz^y  (ti  ,  r2 ) ,  iV  partial  products 
are  summed  up?  First,  the  variance  l^ori  could  be  written 
as  follows: 


2’’+^-! 

Fari=  ^  Xfp{Xi)-m^.  (16) 

i=0 


p{X2i)p  =  p{X2i^l){l-p),  (13) 

Following  the  definition  in  (6),  p{X2i)  =  (1 
and  p(X2i-|-i)  =  (1  -  The  constant /ui 

represents  the  number  of  zeros  in  (r  -h  l)-digit  long  binary 
represention  of  the  number  2i.  If  the  probability  for  p(X2t) 
is  multiplied  by  p  and  probability  forp(X2t+i)  by  1  -p,  the 
same  expression  results  on  the  both  sides  of  Eq.  (13).  We 
can  write  as  well  the  relation: 

p{Xi)  =  p{X2i),  (14) 

because  the  number  of  ones  in  the  binary  digit  will  remain 
the  same  if  the  number  is  multiplied  by  2  (shift  for  one  digit 


When  j  -  i  >  9  +  max(ri,T2)  in  Eq.  (10),  the 
expression  becomes  constant  and  equal  to  When 

we  deal  with  length  N  and  the  MA  system  of  order  q, 
we  have 

which  gives  m^.  The  part  with  the  value  is  equal  to: 

[•(JV  -q-  max(Ti,r2))(iV  -q-  max(Ti,r2)  -  1) 

I  iV2 


_  [-2iV  +  g  +  max(Ti,r2)  +  l][9  +  max(Ti,T2)]  2 
- ^  ■  (17) 

Other  parts,  when  j  —  i  <  ^  4-  max(Ti,r2),  could  be 
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Figure  1.  Variance  of  the  cumuiant  value  C3^y{q,0)  depening  on  the  length  of  the  output  signal  y{n) 
and  the  system  used 


calculated  as: 

2'-+'-l  2^-1 

t=rO  k=0 

*'^(i-r2J)+rev(comp?(A:))-229“i  j  (18) 

where  rev{compl{k))  stands  for  the  reversed-bit-order 
value  of  the  1  ’s  complement  of  fc,  and  --  is  the  integer  divi¬ 
sion. 

At  length  N,  we  have  N-1  terms  at  the  distance  j-i  = 
1,  AT  —  2  terms  at  the  distance  j  —i  =  2,  and  so  on.  At  the 
distance  j  —  i  =  Q,  we  have  N  --q  terms,  and  at  the  distance 
j  q-\-  max(Ti ,  T2)  only  N  -  q  —  max(ri ,  T2)  terms. 
The  final  expression  for  the  variance  of  the  cumuiant  value 
Cs^yin ,  r2)  using  N  samples  long  output  signal  y  is: 

,  [-2iV  +  g  +  max(ri,T2)  +  l][g  +  max(Ti,r2)]_2 

H - 


The  variance  of  the  sum  of  random  variables  denoting 
partial  products  was  split  into  the  variance  Var\  and  co- 
variance  Cav\  of  the  individual  partial  products.  From  Eq. 
(19)  is  evident  that  both  parts  decrease  when  longer  output 
signals  are  used  for  the  cumuiant  calculation.  The  contri¬ 
bution  of  Vari  is  significant,  because  it  gives  the  answer 
how  fast  the  error  of  the  estimation  will  decrease.  The  co- 
variance,  Cavi,  which  is  the  second  part  in  Eq.  (19),  could 
lower  the  total  variance,  but  less  than  for  one  class  of  mag¬ 
nitude.  With  the  help  of  Eq.  (19),  it  could  be  predicted  how 
fast,  or  how  accurate,  the  cumulants  of  the  system  will  be 
computed  with  the  given  length  of  the  system  output  signal. 

3.  Variance  analysis  and  simulation  results 

In  Fig.  1,  four  examples  of  the  variance  calculation 
are  presented,  first  with  a  mixed-phase  system,  the  sec¬ 
ond  one  with  a  minimum-phase  system,  and  last  two  with 
a  maximum-phase  system.  The  dotted  line  depicts  the  re¬ 
sult  of  the  analytical  model  in  Eq.  (19),  the  dashed  line  the 
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Figure  2.  Analytical  variances  of  the  cumulant  values  Cz,y{Tx ,  tz)  depending  on  lags  ti  and  t2  (system 
h=[1  0.8  -1.6  1.2]);  left  Von (ri,T2)/^,  right  yarAr(ri,T2);  N  =  1000 


approximation  from  Eq.  (5),  and  the  solid  line  the  variance 
obtained  in  computer  simulations  with  signals  of  the  lengths 
indicate<i  on  x-axis. 

In  Table  1,  the  actual  impulse  responses  with  their  or¬ 
ders  q  are  gathered.  In  all  four  simulation  tests  30  indepen¬ 
dent  Monte  Carlo  runs  were  performed.  It  was  noticed  that 
the  simulation  variance  was  well  estimated  with  expression 
(19).  On  the  other  hand,  the  simple  estimation  Vari/n  is 
not  so  exact,  but  in  some  examples  where  the  sum  of  the 
samples  in  the  impulse  response  is  strongly  positive,  it  hap¬ 
pens  that  the  simpler  estimation  Vari/n  is  even  more  ac¬ 
curate  as  the  complete  expression  (19). 

Additionally,  the  variance  structure  inside  the  third-order 
cumulant  was  studied  (Fig.  2).  The  difference  between  the 
approximation  Vari(ri,r2)/iV  and  the  analytical  model 
Farjv(ri,  r2)  is  shown.  The  system  used  has  mixed  phase. 
It  is  evident  that  the  variances  are  close  to  zero  when  the 
lags  n  and  T2  are  bigger  (ri  >  g,  r2  >  g  and  In  -  r2 1  >  g). 
It  could  be  noticed  as  well  that  the  variance  is  much  larger 
for  the  cases  n  =  ^2,  n  =0  and  T2  =  0.  This  should  be 
considered  when  new  identification  methods  are  designed. 

4.  Conclusions 

Starting  with  the  analytical  variance  model  of  the  third- 
order  cumulant  C(ri,r2),  the  variance  of  the  identified 
system  response  can  be  obtained  analytically  as  well,  re¬ 
specting  the  identification  method  applied.  Such  a  variance 
model  leads,  on  the  one  hand,  to  a  confidence  measure  of 
the  system  identified  at  a  certain  AT,  on  the  other  hand,  it 
indicates  the  systems  that  generate  too  high  variance  to  be 
identified  in  finite  conditions. 

Observing  Eqs.  (19)  and  (18),  it  is  evident  that  com¬ 
putational  complexity  of  the  variance  model  developed  is 
>^2)4-1)  ajjjj  it  ig  dependent  on  the  signal 


length  N  used  in  the  cumulant  estimation.  Therefore,  this 
SISO  variance  model  may  also  be  implemented  as  a  build¬ 
ing  block  in  the  multichannel  MISO  and  MIMO  variance 
models  with  a  reasonable  complexity. 
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Abstract 

In  this  paper,  we  present  the  minimum  entropy 
fusion  approach  for  multisensor  data  fusion  in  non- 
Gaussian  environments.  We  represent  the  fused  data  in 
the  form  of  the  weighted  sum  of  the  multisensor  outputs 
and  use  the  varimax  norm  as  the  information  measure. 
The  optimum  weights  are  obtained  by  maximizing  the 
varimax  norm  of  the  fused  data.  The  minimum  entropy 
fusion  solution  only  depends  on  the  empirical  distribu¬ 
tion  of  the  sensor  data  and  makes  no  specific  distri¬ 
bution  assumptions  about  the  sensor  data.  Numerical 
simulation  results  are  provided  to  show  the  effectiveness 
of  the  proposed  fusion  approach. 


1.  Introduction 

In  recent  years,  the  problem  of  multisensor  fusion 
has  received  considerable  attention  [1][2].  They  play 
an  important  role  in  the  operation  of  intelligent  ma¬ 
chines  and  systems.  The  primary  aim  of  sensor  fu¬ 
sion  is  to  gain  more  accurate  information  with  reduced 
uncertainty  by  utilizing  redundancy  information  from 
multiple  sensors.  The  potential  advantages  of  integrat¬ 
ing  multiple  sensory  information  correspond  to  the  no¬ 
tions  of  redundancy,  complementarity,  timeliness  and 
cost  of  the  information,  respectively  [1].  In  the  past 
decades,  there  have  been  developed  many  fusion  tech¬ 
niques  [3]  [4]  [5].  The  minimum  variance  approach  by 
Durrant- Whyte  [3]  is  based  on  the  uncertainty  model¬ 
ing  of  multisensor  systems  and  uses  the  Bayesian  in¬ 
ference  theory.  The  weighted  least  squares  method  by 
Matthies  and  Shafer  [4]  is  a  variation  of  the  minimum 
variance  approach  which  takes  the  covariance  informa¬ 
tion  into  consideration.  The  geometric  approach  pro¬ 
posed  by  Nakamura  and  Xu  [5]  is  to  associate  geometric 


uncertainty  with  the  covariance  ellipsoid  of  the  fused 
sensory  information.  The  optimum  fusion  is  obtained 
by  minimizing  the  geometric  volume  of  the  ellipsoid. 
These  techniques  are  developed  under  the  Gaussian  as¬ 
sumption  and  their  performances  usually  deteriorate 
when  the  underlying  signal  and  noise  distribution  de¬ 
viate  from  the  assumed  Gaussian.  It  is  known  that 
the  Gaussian  assumption  is  for  the  mathematical  con¬ 
venience  and  is  usually  only  justified  in  theory  by  the 
central  limit  theorem.  In  practical  applications,  how¬ 
ever,  it  does  not  generally  hold  and  is  often  violated. 


In  this  paper,  we  present  the  minimum  entropy 
fusion  approach  for  multisensor  data  fusion  in  non- 
Gaussian  environments.  We  consider  the  fusion  in  the 
scope  of  linear  combination,  that  is,  the  fused  informa¬ 
tion  is  represented  as  the  weighted  sum  of  the  sensor 
outputs.  We  use  the  varimax  norm  as  the  information 
measure.  The  varimax  norm  has  been  defined  by  data 
analysts  [6]  [7]  in  trying  to  find  a  simple  representation 
of  set  of  orthogonal  vectors.  It  measures  the  simplicity 
of  a  signal  and  maximizing  the  varimax  norm  has  the 
effects  of  simplifying  the  appearance  or  the  entropy  of 
a  signal.  The  minimum  entropy  deconvolution  (MED) 
[8]  by  Wiggins  is  one  successful  application  of  the  vari¬ 
max  norm  to  the  blind  deconvolution  problem.  We  pro¬ 
pose  the  minimum  entropy  fusion  solution  which  min¬ 
imizes  the  entropy,  or  maximizing  the  varimax  norm 
of  the  fused  sensor  data.  The  minimum  entropy  fusion 
solution  depends  only  on  the  empirical  distribution  of 
the  sensor  data  and  makes  no  distribution  assumptions 
about  the  sensor  data.  Numerical  simulation  results 
are  provided  to  show  the  effectiveness  of  the  proposed 
fusion  approach. 
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Figure  1.  Multisensor  data  fusion  model. 


2.  Mathematical  Model  for  Multisensor 

Data  Fusion 

Consider  a  multisensor  fusion  system  with  M  (M  > 
2)  sensors,  as  shown  in  Figure  1.  We  assume  that  each 
sensor  outputs  are  corrupted  with  additive  noise  with 
arbitrary  distribution.  The  mth  sensor  measurement 
can  be  expressed  as 

=  5(t) m  =  (1) 

where  s(i)  denotes  the  signal  and  n^(i)  represents  the 
additive  noise  of  the  mth  sensor.  We  assume  that  nm{t) 
is  a  identically  independently  distributed  (i.i.d)  process 
of  zero  mean.  The  objective  of  fusion  is  to  combine  the 
sensory  information  in  a  systematic  manner  to  get  a 
good  consensus  s{t).  We  consider  fusion  in  the  scope 
of  linear  combination,  that  is,  the  fused  information  is 
given  as  a  linear  combination  of  the  M  sensor  measure¬ 
ments 

M 

^mXm{t),  (2) 

m=l 

where  m  =  1, 2, . . . ,  M}  denote  the  weighting 

coefficients.  The  objective  of  fusion  is  to  find  the  op¬ 
timum  weights  that  minimize  the  uncertainty  of  the 
fused  information. 

3.  Minimum  Variance  Fusion  Approach 

The  most  widely  used  uncertainty  measure  in  fu¬ 
sion  is  the  variance.  The  minimum  variance  fusion  can 
been  derived  from  different  perspectives;  they  basically 
minimize  the  variance  of  the  fused  sensory  information. 
Consider  the  mean  of  the  fused  information  x{t) 


where  E  denotes  the  expectation.  To  make  x{t)  an 
unbiased  estimate  of  s{t),  we  impose  the  constraint 
a^w  =  1  where  a  =  [1,1,...,!]^.  The  variance  of 
x{t)  is  given  by 

Jmv  =  E{]^n{t)n^{t)w}  —  w^RnW.,  (4) 

where 

W  =  [wi,W2,...,'Wm]'^ 

n{t)  =  [ni{t),n2,...,nMit)f 

and  Rn  denotes  the  covariance  of  n{t).  The  minimum 
variance  fusion  solution  can  be  solved  by  the  following 
linearly  constrained  optimization  problem 

wn^y  =  aigmin  Jmv,  (5) 

subject  to  the  constraint  a^w  =  1.  Using  the  method 
of  Lagrange  multipliers,  the  optimum  solution  ULmv 
can  be  obtained  as 

W.MV  =  (6) 

4.  Varimax  Norm  and  Minimum  Entropy 
Fusion 

The  minimum  variance  fusion  solution  is  based  on 
the  second-order  statistics  of  the  sensory  information 
and  is  optimum  only  under  the  Gaussian  assumption. 
Its  performance  usually  deteriorates  when  the  underly¬ 
ing  Gaussian  model  is  violated.  It  is  known  that  Gaus¬ 
sian  processes  are  characterized  by  its  second-order 
statistics  while  non-Gaussian  processes  contain  valu¬ 
able  information  in  their  high  order  moments  which 
can  be  exploited  for  improved  performance.  In  this  pa¬ 
per,  we  use  the  varimax  norm  as  an  information  mea¬ 
sure  to  study  the  problem  of  multisensor  data  fusion  in 
non-Gaussian  environments. 

The  varimax  norm  has  been  defined  by  data  analysts 
[6]  [7]  in  trying  to  find  a  simple  representation  of  set 
of  orthogonal  vectors.  It  measures  the  simplicity  of  a 
signal  and  maximizing  the  varimax  norm  has  the  effects 
of  simplifying  the  appearance  or  the  entropy  of  a  signal. 
We  usually  regard  the  varimax  norm  as  another  form 
of  representing  the  entropy.  The  varimax  norm  of  the 
fused  information  x{t)  is  defined  as 


M 


M 


^  WmE[Xm{t)]  =  (3) 


m=l 


<=1 


F[x(t)]  = 


C[xit)] 

L^[x{t)] 


(7) 
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where  N  is  the  number  of  sensor  data.  Note  that  V{y) 
is  related  to  the  sampled  Kurtosis  of  {y{t)}  which  is 
defined  as  the  all  zero  lag  of  the  fourth-order  cumulant 
normalized  by  the  variance.  The  varimax  norm  only 
depends  on  the  empirical  distribution  of  sensor  data 
{3/(1))  3/(2))  •  •  • )  y(f)}  viewed  as  a  simple  random  sam¬ 
ple.  The  varimax  norm  is  also  scale  invariant.  When 
using  the  varimax  norm  as  the  uncertainty  measure  for 
fusion,  some  constraints  are  needed  to  be  imposed  on 
the  weighting  coefficients  to  make  the  optimization  so¬ 
lution  unique.  We  propose  the  minimum  entropy  fusion 
solution  as 

aiME  =  argm^v[x(f)],  (8) 

subject  to  the  constraint  =  1.  The  optimization 
(8)  is  nonlinear  and  numerical  techniques  are  needed. 

5.  Performance  Analysis 

In  this  section,  we  show  some  properties  of  the  min¬ 
imum  entropy  fusion  solution.  The  theoretical  results 
in  [9]  do  provide  some  useful  intuitive  justifications  for 
it. 

Let  X  he  a  set  of  random  variables  with  finite  vari¬ 
ances  which  is  closed  under  linear  combinations,  i.e., 
Xli  O'i^i  +  Ci  is  in  ^  when  Xi  are  where  Ui  and  a  are 
constants.  We  give  the  definitions  of  equivalency  and 
inequality  for  random  variables  as  follows. 

Definition  For  two  random  variables  x  and  y,x  =  y, 
if  for  some  constants  a  and  c  ^  0,  ax  +  c  has  the  same 
probability  distribution  as  y;  and  x-  >  y,  if  for  a  set 
of  constants  Oj  with  2/  =  where  Xi 

are  independent  samples  of  x.  Notation  “•  >”  means 
“■  >  but  not  =”. 

The  variables  x  and  y  are  regarded  as  equivalent 
when  X  =  y.  Notation  •  >  is  a  partial  order  on  X 
and  has  the  properties  of  transitivity  and  asymmetry, 
that  is,  a  X-  >  y  and  y  >  z  then  x-  >  z-,  if  x-  >  y 
and  y  >  x,  then  x  =  y.  The  equivalence  =  and  the 
partial  order  •  >  also  characterize  the  Gaussianity  of 
random  variables.  The  following  lemma  provides  the 
order  relations  of  random  variables  [9]. 

Lemma  For  x  in  X  and  z  Gaussian, 

X-  >  y^  ajXi-  >  z,  (9) 

i 

and  (9)  is  strict  unless  either  (a)  X  is  Gaussian  or  (b) 
X  is  not  Gaussian,  but  the  linear  combination  is  trivial 
(no  two  fli’s  are  nonzero). 


Relation  (9)  implies  that  the  linear  combinations  of 
independent  random  variables  are  more  Gaussian  than 
any  individual  component  of  the  combination.  Thus, 
we  usually  interpret  the  partial  order  a;*  >  y  as  y  being 
more  Gaussian  than  x. 

The  Kurtosis  has  been  traditionally  used  in  in  statis¬ 
tics  for  measuring  the  Gaussianity  of  a  random  variable 
(the  Kurtosis  of  a  Gaussian  process  is  zero).  In  [9],  it 
has  been  shown  that  the  varimax  norm  agrees  with  •  > 
on  X^  that  is, 

X-  >y  implies  V{x)  >  V{y),  (10) 

for  every  x  and  y  which  are  filtered  white  samples  from 
A'. 

Assume  that  s{t)  and  {nm{i),  m  =  1, 2, . . . ,  M}  are 
independent  samples  from  a  non-Gaussian  X.  With 
the  constraint  a^w  =  1,  the  fused  information  can  be 
written  as 

x{t)  =  s{t)  -f-  n/(t),  (11) 

where  nf{t)  =  denotes  the  fused  sen¬ 

sor  noise  information.  Then,  we  have  x{t)^  < 
and 

V[y{t)]  <  V[nf{t)].  (12) 

As  indicated  by  (12),  when  no  optimization  is  per¬ 
formed  over  {wm},  the  fused  information  would  have 
more  uncertainty  than  the  fused  noise  information. 
However,  maximizing  the  varimax  norm  of  the  fused 
information  has  the  effects  of  maximizing  the  varimax 
norm  of  the  fused  noise  information,  or  in  other  words, 
minimizing  the  uncertainty  of  the  fused  noise  informa¬ 
tion. 

6.  Numerical  Simulations 

We  provide  a  numerical  example  to  show  the  effec¬ 
tiveness  of  the  proposed  minimum  entropy  solution. 
More  complete  numerical  studies  are  still  under  investi¬ 
gation.  In  the  simulation,  we  use  M  =  4  and  N  =  200. 
Assume  that  the  signal  is  Gaussian  distributed  and 
the  sensor  noise  follows  the  one  side  exponential  dis¬ 
tribution.  Figure  2  shows  the  mean  square  error  of  the 
fused  sensory  information  versus  noise  variance.  The 
comparison  is  made  with  the  minimum  variance  fusion 
approach.  It  is  evident  that  the  minimum  entropy  fu¬ 
sion  shows  improved  performance  over  the  minimum 
variance  fusion  for  all  the  noise  variances  used  in  the 
test. 
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[9]  D.  Donoho,  “On  minimum  entropy  deconvolu¬ 
tion”  ,  in  Applied  time  series  analysis  J/,  Academic 
Press,  pp.  565-608, 1981 


Figure  2.  Fusion  by  the  minimum  entropy  and 
the  minimum  variance  approaches. 
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Abstract : 

In  underwater  acoustics,  the  signal  received  by  sensors  is  a  mixture  of  different  elementary  sources,  filtered 
by  the  environment.  In  blind  separation  of  sources,  we  can  isolate  each  source  from  different  mixtures  of 
sources  without  any  a  priori  information,  except  for  assuming  statistical  independence  of  the  (Afferent 
sources.  Two  French  researchers,  J.  Herault  and  C.  Jutten  had  earlier  proposed  a  neuromimetic  solution  to 

the  problem.  .  .  ,  j  •  ; 

In  our  work,  we  use  this  solution  to  separate  convolutive  mixtures  of  simulated  complex  underwater  signals 
in  shallow  water  environment  To  allowed  the  multipath  identification  a  whitening  step  have  to  be 
introduced.  We  propose  a  local  whitening  procedure  that  does  not  impact  the  separated  signal  output  and 
preserve  the  signal  characteristics 

This  promising  technique  can  he  improved  using  non  causal  whitening  filters  more  adapted  to  the  target 
environment. 


Keywords  :  shallow  water  propagation,  source  separation,  multipath  identification,  whitening. 


1.  Introduction 

The  Blind  Separation  of  Sources  problem  arises 
from  different  fields  such  as  astronomy,  astrophysics, 
underwater  acoustics,  medical  applications.  According  to 
the  domain,  one  can  focus  on  an  object  (star,  ship, 
electrocardiogram  wave,  etc.)  that  produces  some  signals 
(optical,  electromagnetic,  acoustics  signals,  etc..) 

In  underwater  acoustic,  the  signal  received  by 
sensors  is  a  mixture  of  different  elementary  sources, 
filtered  by  the  environment.  For  example,  these  different 
sources  could  be  the  signatures  of  vessels,  the  noise  made 
by  the  environment,  self-noise,  etc.. 

The  optimal  use  of  the  sensors  will  be  to  separate 
all  these  sources.  This  issue  is  called  blind  separation  of 
sources  because  we  do  not  have  any  a  priori  knowledge 
about  these  sources  :  the  only  hypothesis  made  is  that 
they  are  statistically  independent. 


The  sources  path  trough  a  transmission  field  are  received 
on  a  set  of  sensors. 

The  transmission  field  is  supposed  to  be  isotropic 
deterministic,  to  present  a  stationary  property  and 
furthermore  to  be  linear.  So  the  received  signals  can  be 
considered  as  linear  mixtures  of  the  initial  sources. 

Using  only  the  knowledge  of  the  received  signals, 
one  can  try  to  characterise  the  initial  sources. 

The  sources  are  supposed  to  be  in  a  shallow  water 
environment  The  signals  are  simulated  and  the  mixtures 
received  by  the  sensors  are  convolutive. 

The  separated  signals  will  be  used  for  several 
purposes  such  as  detection,  noise  reduction,  classification 
etc..  For  this  applications  it  is  important  to  preserve  as 
much  as  possible  the  initial  sources  characteristics.  The 
proposed  algorithm  is  design  in  this  way. 
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2.  Propagation  characteristics 

In  the  shallow  water  environment,  the  signal 
received  by  the  hydrophone  is  the  result  of  the  direct 
sound  added  with  the  echoes  of  the  source. 

Depending  on  the  position  of  the  source  and  the 
receiver,  the  propagation  effect  can  be  modelled  as  a  linear 
time-invariant  filter  Hy.  The  impulse  response  hij(t)  of  the 
filter  can  be  estimated  with  GAMARAY,  a  ray-based 
model  developed  by  E.  K.  Westwood  [5]. 

Most  of  the  taps  of  the  impulse  response  are  equal 
to  zero  ;  the  non  null  taps  are  representative  of  the  direct 
sound  and  the  echoes.  Typical  impulse  responses  of  the 
filters,  for  two  different  positions  source-receiver,  are 
shown  on  the  figure  1  : 


H 12  Filter 


H21  Filter 


Time  (second) 


Starting  from  white  noise  processes  N_1  and  N_2, 
the  initial  sources  I_1  and  I_2  are  obtained  by  linear 
filtering  respectively  FI  and  F2. 

Then  they  propagate  trough  the  media.  During  the 
propagation  to  each  sensor  R_1  and  R_2  a  transformation 
is  performed  respectively  by  the  filters  Hu  and  H22  and 
the  coloured  sources  S_1  and  S_2  are  obtained. 

It  is  clear  that  the  filters  H. .  and  H  could  not  be 
1 1  22 

estimated  using  independence  properties  of  the  sources 
because  all  the  signals  of  a  defined  line  (N_i,  I__i,  S_i)  are 
independent  to  the  signals  taken  on  an  other  line  (NJ, 
IJ,  S J).  Further  hypothesis  have  to  be  stated  to  define 
the  point  to  be  reached. 

The  signals  received  on  R_1  and  R_2  are 
convolutive  mixtures  of  S_1  and  S_2.  The  filters  H12  and 
H21  are  a  model  of  the  propagation  in  shallow  water. 

The  filters  H|2  and  H21  could  be  estimated  by 
restoring  the  independence  properties  of  the  output  of  a 
system  build  as  a  mirror  image  of  the  propagation. 


3.  Source  separation  principle 

The  algorithm  proposed  in  [3]  can  be  seen  as  an 
invert  propagation  procedure. 


Figure  1  :  Propagation  effect  in  shallow  water 
environment.  Impulse  response  of  two  different  filters  Hj2 
and  H2]  estimated  by  GAMARAY. 


The  received  signal  rj(t)  on  the  hydrophone  j  is  the 
contribution  of  each  source  signal  ,  filtered  by  the 
environment : 

rj(t)=Xhij(t)*Si(t)  i={l,...,M}(l) 

For  two  hydrophones  R_1  and  R_2  and  two 
sources  N_1  and  N_2,  the  received  signal  can  be  modelled 
as  convolutive  mixtures  as  shown  on  the  figure  2. 


N_1  U  S^l  R_1 


N_2  1^2  S^2  R^2 

Figure  2  :  Received  signals  generation 


Figure  3  :  Estimation  of  the  sources  S 1  and  S2  by  invert 
propagation  procedure. 


The  problem  is  now  an  identification  one  using  a 
neural  net. 

The  estimation  of  the  separating  filters  C 12  and 
C21  is  based  on  the  minimisation  of  the  cross 
information  of  the  network  output.  The  cross  information 
can  be  quantify  using  various  statistics.  Quadratic,  higher 
order  or  non  linear  solutions  exist  and  the  choice  depends 
on  the  application  type. 

From  our  experience,  the  results  are  similar  in 
terms  of  performance  and  tracking  capability.  The  different 
memory  parameters  involved  in  the  various  algorithms 
have  to  be  optimised  depending  on  the  local  or  statistical 
properties  of  the  sources. 

An  important  point  to  underline  is  that  the 
separating  filters  C12  and  C21  would  be  equal  to  the 
mixture  filters  H12  and  H21  if  the  sources  Si  and  S2  are 
equivalent  to  a  white  noise  [4]. 
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As  an  example  let  us  consider  a  mixture  of  twQ 
white  noises.  The  first  one  is  a  realisation  of  a  uniform 
distribution  and  the  second  is  obtained  by  taking  the 
Arctang  of  a  gaussian  process. 

The  filters  Hu  and  H22  are  equal  to  1  and  the 
impulse  responses  of  filters  H12  and  H21  present  a  sparse 
corresponding  to  a  shallow  water  propagation  distribution 
(Figure  1). 

The  independence  criteria  is  the  fourth  order  cross 
moment  of  the  estimated  signal.  10  000  points  are  used 
for  this  simulations. 


C 12  Filter  C21  Filter 


Time  (second)  Time  (second) 

Figure  4  :  Estimation  of  the  impulse  response  of  the  two 
separating  filters  Cj2  and  C21 


The  rough  results  are  presented  on  figure  4,  using  a 
dash  line.  One  can  see  that  a  post  processing  is  possible 
by  identifying  the  main  propagation  path  :  the  impulse 
responses  of  the  filters  Cj2  and  C21  are  close  to  the 
typical  impulse  responses  used  for  this  simulation  (cf. 
figure  1). 

Because  of  the  particular  shape  of  the  filters  Hj2 
and  H2],  only  maxima  values  have  to  be  taken  into 
account  to  identify  the  impulse  responses  of  the 
separating  filters  Cj2  and  C21. 

This  reconditioning  will  remove  the  variance 
estimation  of  the  null  taps  of  the  filters.  In  some  cases, 
those  variations  can  represent  the  major  part  of  the 
separating  filters  and  lead  to  poor  separation  results. 

4.  Local  whitening  Algorithm 

The  underwater  sources  are  not  white  processes  so 
a  whitening  step  have  to  be  done  to  come  back  to  the 
previous  case  and  identify  the  main  path  components. 

If  a  pre-whitening  is  performed  the  separated 
sources  have  to  be  coloured  to  be  used  for  identification 
purposes.  To  avoid  doing  this  back  and  forth  procedure, 
we  propose  to  do  the  whitening  step  just  where  it  is 
needed.  This  is  to  say  for  the  separation  filters  C 12  and 

C21  updates. 


R_1  sj 


R_2  S2 

Figure  5  :  Estimation  of  the  sources  SI  and  S2  by  invert 
propagation  procedure  and  local  whitening. 

The  output  equation  is  still  the  same  : 

(2) 

But  the  updating  independence  criteria  are  not  estimated 
directly  on  the  output  signal  but  on  the  whitened  versions 
of  them. 

The  algorithm  can  be  summarised  with  the  following 
lines  of  pseudocode. 

For  all  samples 
Begin 

Estimated  the  whitening  filter  taps  B1 
and  B2  using  the  output  signal. 

Estimate  the  independence  criteria 

(Indep)  using  the  whitened  output  signal. 

Update  all  the  coefficients  using  : 

Qj  (t+1)  =  Cij  (t)  +  |ii  Indep 
Estimated  the  output  signals 
End 

The  following  example  presents  the  results  of 
underwater  signal  mixed  using  the  same  filters  than 
previous  example. 

The  first  source  is  a  submarine  signal  and  the 
second  a  ship  noise. 

The  sampling  frequency  is  2000  Hz,  and  the  signal 
duration  is  equal  to  10  seconds. 

Using  the  local  whitening  algorithm,  the  estimated 
impulse  responses  of  the  filters  Cj2  and  C21  are  presented 
on  figure  6. 

The  whitening  filters  Bjj  and  B22  are  50  taps  FIR 

filters. 


342 


Figure  6:  Estimation  of  the  two  separating  filters  Cj2  and 
C21  using  local  whitening  algorithm. 


The  results  are  promising  -  the  estimated  filters 
Cj2  and  C21  are  close  to  H,2  and  H21  -  but  can  be 
improved.  So  the  identification  of  the  main  propagation 
path  could  be  done  for  coloured  sources. 
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The  source  separation  filters  are  only  related  to  the 
propagation  and  not  to  the  colour  of  the  sources. 

By  doing  a  local  whitening,  the  estimation  of  the 
filters  are  only  related  to  the  independence  properties  of 
the  sources  and  not  to  their  respective  colours. 


5.  Conclusion 

We  present  here  the  first  results  of  sources 
separation  technique  applied  to  underwater  acoustic  in 
shallow  water. 

We  propose  a  local  whitening  version  of  the 
neuromimetic  approach  proposed  in  [4].  The  purpose  is  to 
avoid  unnecessary  prewhitening  /  unwhitening  operations. 
Moreover  by  doing  a  local  whitening  procedure  for  the 
separating  filters  estimation,  the  impulse  response  of  the 
propagation  channel  can  be  identified. 

This  offer  the  opportunity  to  constraint  the 
estimation  using  the  knowledge  of  the  media,  and 
improve  the  results. 

Some  improvement  of  the  whitening  procedure 
have  to  be  done  and  come  closer  to  the  results  obtained 
using  white  noise  sources. 

Two  ways  for  improvement  can  be  investigate  : 
the  use  of  UR  filter  and  the  estimation  of  non  minimum 
phase  filters.  This  non  minimum  phase  filters  will  be 
mandatory  as  long  as  the  filters  will  no  longer  be  equal  to 
1.  Various  Blind  deconvolution  algorithm  using  non 
linear  criteria  can  then  be  used  [1],  [2]. 

The  first  results  confirm  the  possible  use  of 
sources  separation  technique  for  underwater  applications. 
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ABSTRACT 

A  particular  source  separation  problem  is  addressed  in  this 
paper.  We  mainly  focus  on  the  separation  of  convolutive 
mixtures  of  rotating  machine  noises  when  the  rotation 
speeds  are  close.  Three  specific  points  are  developed.  In 
the  first  point,  we  study  the  feasibility  of  the  separation 
of  periodic  signals,  regarding  the  hypothesis  of  random 
and  non  gaussian  sources.  We  also  discuss  about  the 
hypothesis  of  independence  between  the  sources,  in 
function  of  the  rotation  speeds.  In  the  second  point,  we 
analyze  the  performances  of  the  source  separation  for  close 
rotation  speeds.  They  are  linked  to  a  partial  correlation 
between  the  machine  noises.  Then,  we  propose  a  new 
method  for  very  close  rotation  speeds,  which  takes  into 
account  this  partial  dependence  between  the  sources. 


1.  Introduction 

A  pardcular  source  separation  problem  is  addressed  in  this 
paper.  We  mainly  focus  on  the  separation  of  convolutive 
mixtures  of  rotating  machine  noises.  The  influence  of  the 
rotation  speeds  on  the  performances  of  source  separation 
methods  are  detailed.  These  methods  consist  in  recovering 
the  signals  emitted  by  p  sources  £(t)  from  M  observed 
linear  mixtures  r(t)  of  these  sources.  For  that  purpose, 
well-known  methods  classically  use  the  mere 
uncorrelation  of  the  emitted  source  signals  but  need  extra 
information  to  achieve  their  objective.  In  a  general 
context  of  source  separation,  this  a  priori  knowledge  is 
replaced,  with  higher  order  statistic  information.  The  only 
three  assumptions  are  the  non-gaussianity  of  the  source 
signals,  their  mutual  independence  and  the  linearity  and 
stationarity  of  the  propagation. 

Since  ten  years,  many  solutions  have  been  proposed 
which  test  different  measurements  of  the  statistical 
independence.  They  are  based  on  the  use  of  fourth-order 
moments  or  cumulants  [1],  [2],  [3],  [4],  [5],  [6]  nonlinear 
functions  which  also  make  higher  order  statistics  appear 
[7]  [8]  [9],  or  contrast  functions  [10].  Recently,  a 
deflation  procedure  has  been  developed  in  [1 1].  The  linear 
filters  (which  characterize  the  propagation  from  the 
sources  to  the  sensors)  can  be  estimated  using  adaptive  or 
nonadaptive  algorithms  minimizing  or  looking  for  zeros 
of  different  independence  criteria.  Some  procedures  are 
also  based  on  the  maximum  likelihood  principle  [12] 


[13].  An  original  geometry-based  procedure  has  also  been 
proposed  in  the  case  of  n- valued  signals  [14]. 

Most  of  these  works  reconstruct  the  source  signals  from 
instantaneous  mixtures  :  the  observations  at  any  time  t 
only  depend  on  the  sources  at  any  time  L  In  the  case  of  a 
convolutive  mixture  of  narrow-band  signals,  the  model 
can  be  reduced  to  an  similar  instantaneous  mixture  with 
complex  coefficients  of  the  mixture.  Consequently, 
similar  methods  may  be  used  to  separate  these  sources . 

In  §3,  we  study  the  feasibility  of  the  separation  of 
periodic  signals,  regarding  the  hypothesis  of  random  and 
non  gaussian  sources  when  the  rotation  speeds  are 
different  enough.  In  §4,  we  study  the  feasibility  of  the 
source  separation  methods,  regarding  the  hypothesis  of 
independent  sources.  More  precisely,  these  methods  test 
the  cancellation  of  second  and  fourth-order  cross- 
cumulants.  Then  we  analyze  the  performances  of  the 
source  separation  versus  the  difference  between  the 
rotation  speeds.  Indeed,  this  quantity  measures  the 
dependence  between  the  sources.  In  §5,  we  propose  a  new 
method  for  very  close  (or  identical)  rotation  speeds,  which 
takes  into  account  this  partial  dependence  between  the 
sources. 


2.  Modelization  of  the  problem 

In  a  general  blind  source  separation  problem,  the  observed 
M-dimensional  data  vector  r(t)  can  be  represented  in 
frequency-domain  by  an  instantaneous  complex  mixture 
for  each  frequency  bin  n,  which  leads  to  the  following 
model : 

(1)  Bi(n)  =  •••* 

where  W(n)  is  the  N-point  Discrete  Fourier  Transform 
(DFT)  of  the  ith  data  block  of  the  observation  r(t).  a(n) 
represents  the  DFT  of  the  ith  data  block  of  the  p- 
dimensional  data  vector  of  the  sources  s(t).  A(n)  is  a 
matrix  (M.p)  which  characterizes  the  linear  propagation 
from  sources  to  sensors  and  Vi(n)  represents  an  additive 
M-dimensional  gaussian  noise.  The  problem  consists  first 
in  identifying  the  matrix  A(n).  After  a  singular  value 

decomposition,  the  mixing  matrix  A(n)  is  expressed  as 

the  product  of  three  matrices. 

(2)  A(n)  =  F(n)  D(n)  n(n) 
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where  F(n)  and  n(n)  are  two  (M.M)  and  (p.p)  unitary 
matrices.  D(n)  is  an  (M.p)  diagonal  matrix. 

The  two  matrices  F(n)  and  D(n)  are  identified  thanks  to 

second-order  statistic  criteria.  They  respectively  contain 
the  eigenvectors  and  the  eigenvalues  of  the  spectral  matrix 
of  the  observation  Ri(n). 

After  projection  of  the  observation  vector  Ri(n)  in  the 
signal  subspace  (which  is  spanned  by  the  eigenvectors 
associated  to  the  dominant  eigenvalues)  and 
normalization,  the  components  of  the  p-dimensional 
vector,  noted  2^(n),  are  uncorrelated  and  normalized. 
They  are  relied  to  the  components  of  the  normalized 
source  vector,  noted  s'(t),  by: 

(3)  »(n)  =  n(n)  S:i(n) 

where  S|i(n)  is  the  DFT  of  s'(t)- 

n(n)  can  be  expressed  as  a  product  of  Givens  rotations 

and  estimated  thanks  to  fourth-order  criteria,  by  testing 
different  measurements  of  the  statistical  independence  of 
the  sources  [1]  [2]  [3], ...,  [10]  ,  as  presented  in  §1. 

3.  Separation  of  periodic  signals 

This  part  of  the  paper  is  devoted  to  the  feasibility  of  the 
separation  of  periodic  signals,  after  Discrete  Fourier 
Transform.  As  it  has  been  shown  before  in  §2  in  the  case 
of  random  signals,  the  source  separation  methods  lay  on 
the  additional  information  provided  by  fourth-order 
statistics.  This  information  only  exists  under  the 
hypothesis  of  independent  and  non  gaussian  sources. 
Wiatever  the  tested  criterion,  the  variance  of  the  estimate 
of  matrix  n(n)  is  inversely  proportional  to  the  kurtosis 

of  the  sources.  In  a  similar  way,  in  frequency  domain,  we 
study  the  distance  of  the  DFT  to  gaussianity  thanks  to 
the  spectral  kurtosis.  It  is  defined  as  a  section  of  the 
general  trispectrum  of  the  normalized  sources. 

Let  K(Si(n))  be  the  kurtosis  of  Si(n),  defined  by : 


(5)  s(t)  =  A  sin(2jtcot  +  <[)) 

Let  us  compute  the  DFT  of  s(t)  on  the  ith  data  block  of 
Si(n),  for  n  close  to  co  ;  (6) 

N-1 

Si(n)  =  X  s(k-i-i)exp(-27rjkn/N) 
k=0 

A  j(<l)  +  27rcoi  +  7c(N-l)(co-:^) 

Si(n)  =  -e  N 


sin(jt(co  -  :^)N) 

with  F(co)  = - - 

sin(7r(co-— )) 


After  some  computations,  K  is  equal  to  [15]  [16]: 


(7)  K  =  -l  - 


—  E  exp(j4jta)i) 
L  i=:l 


-1- 


G(2co) 


with  G(2a))  = 


sin(27t(oL) 

sin(2jrco) 


|2 


If  CO  is  superior  to  (3/4L),  then  G(2(o)/L^  is  inferior  to 
the  limit  value  (2/3jr)  for  large  L.  The  limit  of  the  ratio 
G(2co)/l2  (for  co=3/4L)  is  almost  constant  with  L.  The 
maxima  of  this  function  are  decreasing  with 
(L  sin(27CC0))'^.  Consequently  the  estimate  of  the  kurtosis 
K(Si(n))  tends  towards  (-1)  for  a  fixed  value  of  co,  if  L  is 
large  enough  and  n  close  to  co.  Periodic  signals  can  be 
decomposed  in  Fourier  series  and  K  tends  towards  -1  for 
the  harmonic  frequency  bins.  As  a  result,  the  non 
cancellation  of  K  provides  additional  equations  that  allow 
the  estimate  of  matrix  n(n)  for  the  harmonic  frequencies 


n.  This  point  theoretically  proves  the  feasibility  of  the 
source  separation  of  rotating  machine  noises. 


4.  Separation  of  sinusoids  with  close 
frequencies 


(4)  K(Si(n))  =  c»™(Si(n),  Si(n)*,  Si(n),  Si(n)*) 

cum(Si(n),Si(n)*)2 

where  cum  represents  the  second  and  fourth-order 
cumulants  and  *  the  complex  conjugate. 

In  practice,  the  source  separation  methods  are  applied  on 
a  record  of  the  random  process  2?(n),  under  the 
hypothesis  of  ergodicity.  Consequently,  the  algorithms 
lay  on  the  non  cancellation  of  the  estimate  K  of  the 
spectral  kurtosis  :  the  statistical  quantities  are  estimated 
by  time  averaging  over  L  data  blocks. 

In  the  case  of  periodic  (thus  deterministic)  signals,  the 
statistical  kurtosis  has  non  sense.  However,  as 
previously,  the  estimate  K  can  always  be  computed  by 
averaging  over  L  data  blocks  on  each  component  of 
Si(n). 

Let  s(t)  be  a  periodic  signal  of  period  T.  For  example,  let 
s(t)  be  a  sinusoid  of  deterministic  frequency  and  phase, 
noted  CO  and  <|) : 


4.1  Discussion  on  the  hypothesis  of  independent 
sources 

In  this  part,  we  analyze  the  performances  of  the  source 
separation.  They  depend  on  the  difference  between 
rotation  speeds.  For  that,  suppose  sl(t)  and  s2(t),  two 
periodic  signals.  They  are  decomposed  in  Fourier  series  as 
combinations  of  sinusoids  of  deterministic  frequencies  col 
and  (col+Aco),  amplitudes  A1  and  A2  and  phases  (|)1  and 
(|)2.  The  term  Aco  is  proportional  to  the  difference 
between  the  machine  rotation  speeds.  The  source 
separation  rests  on  the  non  cancellation  of  the  estimate  K 
of  the  spectral  kurtosis  of  each  source.  This  point  is 
always  verified  since  K  tends  towards  -1  when  L  tends 
towards  infinity,  whatever  the  frequencies  (col  or 
(col+Aco)).  It  is  also  based  on  the  independence  of  the 
sources,  for  random  signals.  In  the  case  of  deterministic 
signals,  it  means  that  the  estimates  of  the  second  and 
fourth-order  normalized  cross-cumulants  tend  towards  0 
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when  L  tends  towards  infinity.  We  suppose  that  N  is  large 
enough,  so  that  there  is  no  interaction  due  to  the  DFT 
between  several  harmonics. 

They  are  respectively  equal  to :  (8) 

1  L  A1A2 

C(S1.S2*)  =  -  2:Sli(n)S2i(n)*  = - . 

L  i=l  4L 

-  <t>2  -  7t(N  -  l)Aco)  ^exp(j27tAcoi)F(col)F(col  +  Aco) 
i=l 

(9)  The  symmetrical  fourth-order  cross-cumulant  is  equal 
to 

C(S1,S1*.S2,S2*)  =  -  ISli(n)Sli(n)*S2i(n)S2i(n)* 
Li=l 


(12) 

( A1  A2F(col)F(co2))^ (a(n)  *  P(n)  *  +a(n)P(n))(l-l G(Aco)l^  )  + 

a(n)  *  p(n)(G(2Aco)  -  G(Aco)^)-IC(Sl,S2*)l^ 
+a(n)p(n)(G(-2Aa))  -  G(-Aco)^)))  =  0 

If  IG(Aa))l=0,  then  (11)  and  (12)  can  be  simplified  into  : 

(13)  p(n)  *  Al^ F(wl)^  a(n)(A2F(ml  +  Aco))^  =  0 

(14) 

lp(n)l^  C(Sl,Sl*,Sl,Sl*)-t-la(n)l^  C(S2,S2*,S2,S2*)  =  0 


-IC(S1,S2*)I^ -I-  SSli(n)S2i(n)l^ -C(S1,S1*)C(S2,S2*) 
Li=l 

i2 


,A1A2,2^,  ,  *  x2r 

=  ( — - — )  F(col)  F(a)l  +  Aco)  [- 


1  L 

—  Sexp(j27cAcoi)| 
L  i=l 


1  L 

-  —  Eexp(j27u(2a)l  + Aco)i)  ] 

L  i=l 

From  their  expressions,  we  conclude  that  these  quantities 
decrease  with  L,  after  normalization,  according  to 


1  L 

I—  Sexp(j27cAcoi)l 
L  i=l 


and  - 


1  L 

“  Eexp(j27cA(Di) 
L  i=l 


As 


detailed  in  §3,  if  Aco  remains  superior  to  (3/2L),  these 
quantities  are  inferior  to  IL  sin(27cAcD)r^  and  IL 
sin(27cAco)!"^  (or  IL  27rAcol“^  and  IL  27tAcDl“^  for  weak 
values  of  Aco).  These  quantities  tend  towards  0  if  Aco  is 
not  strictly  equal  to  zero.  If  Aco  is  equal  to  zero,  then  the 
second  and  fourth-order  normalized  cross-cumulants  are 
respectively  equal  to  1  and  (-1). 

As  a  conclusion,  IG(co)Ig  [0,1]  in  function  of  L  and  co. 


4.2  Computation  of  the  estimated  sources,  using 
source  separation  methods 


Suppose  two  mixtures  of  the  two  sources  sl(t)  and  s2(t). 
The  estimation  of  these  sources  Sl(n)  and  S2(n)  can  be 
modelized  as : 

(10)  Sl(n)  =  (C(Sl,Sl*))“^^^[Sl(n)  +  a(n)S2(n)] 

S2(n)  =  (C(S2,S2*))“^^^[P(n)Sl(n)  +  S2(n)] 

The  cancellation  of  the  second-order  and  (for  example)  the 
symmetrical  fourth-order  normalized  cross-cumulants  leads 
to  the  two  following  equations  :  (11) 

P(n)*Al2F(col)2-Ha(n)p(n)*AlA2F(col)F(col+Aco)G(-Aco) 

+AlA2F(col)F(col-FAco)G(Aco)+a(n)(A2F(coH-Aco))2=0 


As  the  fourth-order  cumulants  of  sources  Sl(n)  and  S2(n) 
are  negative,  (14)  assures  the  separation  of  the  sources. 
The  unique  solution  is  a(n)=P(n)=0.  Equation  (13)  is  also 
verified.  It  is  the  classical  case  of  independent  sources,  in 
the  sense  that  the  estimations  of  the  second-order  and 
fourth-order  cross-cumulants  of  the  sources  are  zero. 

If  0<  IG(A(jL))l  <1,  then  (12)  assures  that  a(n)P(n)=0  (if 


C(S1,S2*)=0).  However,  equation  (11)  is  not  verified  for 
a(n)=0  and  P(n)=0,  since  the  sources  are  correlated.  This 
solution  does  not  lead  to  the  separation  of  the  sources.  In 
that  case,  the  estimated  sources  will  be  equal  to  :  (for 
example  if  P(n)=0) 


(15)Sl(n)  =  (C(Sl,Sl*)) 
-1/2 


1/2  r 


[Sl(n)  +  a(n)S2(n)] 

S2(n)  =  (C(S2,S2*))""'"'[S2(n)] 

-Al.F(col)G(Aco) 
with  :  a  =  ,  / 

A2.F(col-l- Aco) 

We  verify  that  a(n)  tends  toward  zero  with  G(Aco). 


4.3  Computation  of  the  rejection  level 


We  characterize  the  performances  of  blind  source 
separation  methods  in  terms  of  rejection  levels.  After 
separation,  it  remains  in  each  estimated  sources  a  residual 
contribution  of  the  others  sources.  The  rejection  level  is 
defined  as  the  power  spectrum  of  the  contribution  of  the 
residual  source  at  frequency  col  in  the  estimate  of  the 
source  at  frequency  col+Aco.  We  see  that  it  is  equal  in 
that  case  to : 

la(n)l^  (C(S1,S1*))“^ 

It  is  decreasing  with  a(n),  and  therefore  with  G(Aco). 

If  IG(Aco)l  tends  towards  1  (or  Aco  towards  0),  then  a 
tends  towards  (-A1/A2).  It  proves  that  the  rejection  level 
is  bounded  for  totally  correlated  sources. 

We  see  in  figure  1  the  rejection  level  in  function  of 
(L.Aco)  on  simulation  results.  We  remark  that  it  is 
correctly  decreasing  with  (L.Aco),  since  the  estimation  of 
the  source  cross-cumulants  are  more  and  more  close  to 
zero. 
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5.  Case  of  sinusoids  with  identical 
frequencies 

5.1  Number  of  sensors  equal  to  the  source 
number 

In  that  case,  if  Ato  (or  L)  is  not  large  enough,  the 
sources  appear  as  correlated  ones  and  consequently  not 
independent.  In  that  case,  the  source  signals  can  be 
modelized  more  precisely  as  the  addition  of  several 
sinusoids  of  different  frequencies  and  non  gaussian 
random  noises.  In  frequency  domain,  the  kth  source 
Si^(n)  contains  one  or  several  sinusoidal  components  (for 
example  one)  and  can  be  written  as :  (16) 

t  Ak  j(<t)k  +  2jt©i  +  7c(N-l)(co-'^) 

Si'"(n)  =  -^e  N  ,F(co) 

+Ni’^(n) 

Random  processes  N*^(n)  are  assumed  to  verify  the 
previous  hypotheses  of  independent  signals.  We  propose 
a  new  method  of  source  separation  which  takes  into 
account  the  partial  dependence  between  the  sources.  This 
partial  dependence  is  due  to  the  phase  term : 

j(2jt(0i  +  7t(N  -  l)(o)  -  :^) 

(17)  e  N  ,F(co). 

The  first  step  of  the  source  separation  method  at  second- 
order  is  suppressed :  the  projection  and  the  normalization 
of  the  observation  vector  Ri(n)  in  the  signal  subspace 
cannot  provide  here  orthogonal  mixtures  of  the  sources. 
The  mixture  is  then  not  modelized  with  Givens  rotations, 
unless  the  sinusoids  of  same  frequencies  are  considered  as 
a  supplementary  source  in  the  bin  n.  It  means  constraints 
on  the  source  number  and  on  the  sensor  number. 

Suppose  the  first  model: 

(18)  Ri(n)  =  A(n)  Si(n)  +  Vi(n)  n=0, ...,  N-1 
with : 

Ic  Ak  j(<l)k  +  27ro)i  +  Jt(N-l)(a)-:^) 

Sr(n)  =  -^e  N  F((o) 

+Ni^(n) 

A(n)  is  correctly  identified  when  the  estimates  of  each 
^  Ic 

source  Si  (n)  contain  only  one  random  independent 
Ic 

process  Ni  (n)and  a  sinusoidal  component.  In  that  case, 
the  second-order  cross-cumulants  provide  no  information, 
as  their  value  depends  on  the  signal  to  noise  ratio  in  each 

source.  The  fourth-order  cumulants  C(Sk,Sk*,Sj,Sj*) 
are  not  equal  to  zero  (for  k?tj)  as  usual  in  source 
separation  hypotheses.  They  are  equal  to  the  fourth-order 
cumulants  of  the  sinusoidal  components  (9),  since  the 

estimates  Ni^(n)  and  Np(n)  must  be  independent  ( for 
k?tj ).  As  the  sinusoidal  components  cannot  have  been 


normalized  at  second-order,  C(Sk,Sk*,Sj,Sj*)  depends 
on  their  amplitudes  (Ak)2(Aj)2 

C(Sk,Sk*,Sj,Sj*)  can  be  normalized,  in  order  to  obtain 
the  kurtosis  of  a  sinusoid,  which  is  equal  to  -1.  For  that, 

we  have  to  eliminate  the  influence  of  the  noises  Ni^(n) 

and  Ni^(n).  The  estimate  of  the  fourth-order  cross- 
cumulants  is  divided  with  the  squared  modulus  of  the 
second-order  cross-cumulant  (and  not  the  cumulant): 

(19)  ^  C(Sk(n),Sk(n)*,Sj(n),Sj(n)*) 

IC(Sk(n),Sj(n)*)|2 

If  the  estimates  of  each  source  Si^(n)  contain  only  one 

random  independent  process  Ni  (n)  and  a  sinusoidal 
component,  then  C  is  equal  to  -1.  It  is  the  only  constant 
quantity  that  characterizes  the  sources.  Inversely,  we  prove 
that  the  estimate  of  A(n),  such  that  the  estimated 

sources  verify  C=-l,  achieves  the  separation  source. 
Consequently,  thCTe  is  no  spurious  solutions. 

Indeed:  (20) 

C(Sk(n),Sk(n)*,Sj(n),Sj(n)*)  =  -IE(Sk(n),Sj(n)*)l^-H 

E(Sk(n),Sk(n)*,Sj(n),Sj(n)*)  -  E(ISk(n)l^  )E(ISj(n)l^ ) 
Using  equation  (20),  (C=-l)  is  equal  to  (21)  for  each 
couple  of  sources:  (21) 

E(Sk(n),Sk(n)*,Sj(n),Sj(n)*)  =  E(ISk(n)l^  )E(ISj(n)l^ ) 

Equation  (21)  is  realized  if  and  only  if  the  sources  are 
independent  or  with  constant  modulus,  or  are  the  addition 
of  independent  noises  and  constant  modulus  sources.  The 
Discrete  Fourier  Transform  of  periodic  signals  verifies 
this  condition  of  constant  modulus  source  at  the  harmonic 
frequency  bins. 

Ir  Ak 

(22)  ISr(n)l=^.IF(a))l 
•  k 

Si  (n)  does  not  depend  on  the  time  index  i. 
Consequently,  it  is  a  constant  modulus  source. 

5.2  Sensor  number  superior  to  the  source 
number 

In  that  case,  the  first  step  of  the  source  separation  method 
is  unchanged  :  the  projection  of  the  observation  vector 
Ri(n)  in  the  signal  subspace  provides  uncorrelated 
signals  if  the  sinusoids  of  same  firequencies  are  considered 
as  a  supplementary  source  in  the  bin  n.  As  previously, 
the  signals,  after  projection  and  normalization,  are  linked 
to  the  sources  by  the  matrix  n(n).  This  matrix  can  be 

expressed  as  a  product  of  Givens  rotations.  When  the 
sensor  number  is  superior  to  the  source  number,  the 
matrix  n(n)  is  correctly  identified  when  the  estimates  of 

each  source  are  independent.  It  means  that  they  contain 
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only  one  random  independent  process  Ni  (n)  or  one 
sinusoidal  component.  In  that  case,  the  fourth-order 
cumulants  C(Sk,Sk*.Sj,Sj*)  for  k^j,  are  equal  to  zero 
(as  the  usual  hypotheses  of  source  separation  methods). 

However  the  complete  estimation  of  the  source  Si  (n) 
must  contain  one  independent  noise  and  one  sinusoidal 
component,  in  order  to  separate  the  vibrations  issued 
from  two  (Afferent  machines.  Besides,  the  estimates  of 
the  sources  are  considered  as  normalized  sources,  which  is 
not  realistic.  The  power  of  each  component  of  the 
sources  is  then  estimated  with  the  help  of  second-order 
cross-cumulants  between  the  sensors  and  the  normalized 
sources  [15]. 

6.  Conclusion 

A  particular  source  separation  problem  is  addressed  in  Ais 
paper.  We  mainly  focus  on  the  separation  of  convolutive 
mixtures  of  rotating  machine  noises  when  the  rotation 
speeds  are  close.  Three  specific  points  are  developed.  In 
the  first  point,  we  study  the  feasibility  of  the  separation 
of  periodic  signals,  regarding  the  hypoAesis  of  random 
and  non  gaussian  sources.  We  also  discuss  about  the 
hypothesis  of  independence  between  the  sources,  in 
function  of  the  rotation  speeds.  In  the  second  point,  we 
analyze  the  performances  of  the  source  separation  for  close 
rotation  speeds.  They  are  linked  to  a  partial  correlation 
between  the  machine  noises.  Then,  we  propose  a  new 
method  for  very  close  rotation  speeds,  which  takes  into 
account  this  partial  dependence  between  the  sources. 
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Abstract 

We  presents  a  new  necessary  and  sufficient  condition  for 
the  blind  separation  of  sources  having  non-zero  kurtosis, 
from  their  linear  mixtures.  It  is  shown  here  that  a  new 
blind  separation  criterion  based  on  both  odd  (f{y)  =  y^) 
and  even  (f{y)  =  y'^ )  functions,  presents  desirable  solu¬ 
tions,  provided  that  all  source  signals  have  negative  kurtosis 
(sub-Gaussian)  or  have  positive  kurtosis  (super-Gaussian). 
Based  on  this  new  separation  criterion,  a  linear  feedfor¬ 
ward  network  with  lateral  feedback  connections  is  con¬ 
structed.  Both  theoretical  and  computer  simulation  results 
are  presented. 


1.  Introduction 

Blind  source  separation  is  a  fundamental  problem  en¬ 
countered  in  many  applications  such  as  array  signal  pro¬ 
cessing,  sonar,  digital  communications,  some  biomedical 
applications.  The  goal  of  blind  source  separation  is  to  re¬ 
cover  the  independent  source  signals  from  their  linear  mix¬ 
tures  without  the  knowledge  of  mixing  matrix.  Specially 
when  the  propagation  media  is  slowly  changing  (the  mix¬ 
ing  matrix  is  slowly  changing),  an  adaptive  system  for  blind 
source  separation  is  necessary. 

An  adaptive  blind  source  separation  was  first  introduced 
by  Jutten  and  Herault  [11].  Later  it  was  further  developed 
by  others  [12,  7,  4,  1,  2,  8,  5].  Most  of  existing  methods 
mentioned  above  are  based  on  nonlinear  odd  functions  (for 
example,  f(y)  =  y^  or  f{y)  =  tanh{y))  because  they  as¬ 
sume  that  the  probability  distributions  of  all  source  signals 
are  symmetric.  Recently  Choi  et  al  [6]  have  shown  that  a 
new  learning  algorithm  based  on  even  nonlinearity  (e.g.  a 
quadratic  function,  f{y)  =  y^)  enables  the  separation  of 
source  signals  having  non-zero  skewness  (the  3rd-order  cu- 
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mulant)  without  spurious  equilibria.  We  extend  this  result 
to  the  separation  of  source  signals  having  the  same  sign  of 
non-zero  kurtosis  (the  4th-order  cumulant).  A  new  neces¬ 
sary  and  sufficient  condition  for  blind  source  separation  is 
presented.  It  is  shown  here  that  the  source  signals  can  be 
separated  by  a  linear  transformation,  if  and  only  if  all  the 
2nd-  and  4th-order  croj^-cumulants  of  the  output  of  the  net¬ 
work  are  zero,  provided  that  the  source  signals  are  statis¬ 
tically  independent  and  each  of  them  has  the  same  sign  of 
non-zero  kurtosis.  Based  on  this  criterion,  a  linear  feed¬ 
forward  network  with  lateral  feedback  connections  is  con¬ 
structed  with  associated  adaptive  learning  algorithms.  The 
proposed  learning  algorithms  are  based  on  both  cubic  and 
quadratic  functions  to  force  all  2nd-  and  4th-order  cross- 
cumulants  of  the  output  of  the  network  to  vanish  to  zero. 

2  A  New  Blind  Source  Separation  Criterion 

Consider  the  case  where  the  observation  vector  x(f)  G 
]R"  and  the  source  vector  s{t)  6  IR"  are  related  by 

x{t)  =  As(t),  (1) 

where  A  G  IR”^"  is  the  mixing  matrix.  The  problem  of 
blind  sources  separation  is  to  recover  the  source  signals  s{t) 
from  the  observation  vector  x{t)  without  the  knowledge  of 
the  mixing  matrix  A.  Let  y{t)  be  the  output  of  the  net¬ 
work,  i.e.,y(<)  =  W(f)x(t)  =  W(<)As(f).  It  is  desired  to 
update  W(f)  such  that  the  global  system  G{t)  =  W(f)A 
converges  to  a  matrix, 

G  =  PA  (2) 

as  f  ->  00,  for  some  permutation  matrix  P  and  nonsingular 
diagonal  matrix  A. 

Throughout  this  paper,  the  following  assumptions  are 
made: 
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Al:  Ae  IR"^"  is  nonsingular. 

A2:  At  each  time  t,  the  components  of  s{t)  are  statistically 
independent. 

A3:  Each  component  of  s(f)  is  a  zero  mean  ergodic  station¬ 
ary  process  with  a  non- zero  variance. 

A4:  Each  component  of  s{t)  has  a  non-zero  kurtosis,  i.e., 

cum4{si(<)}  =  E{stit)}-SE\sj{t)}, 

0,  fori  =  (3) 

where  E  denotes  the  expectation  operator. 

Without  loss  of  generality,  we  can  assume  (hat  the 
variance  of  each  source  signal  is  unity,  i.e.,  R*,  = 
£;{s(i)s^(f)}  =  I,  where  I  denotes  identity  matrix.  Let 
the  vector  sequence  y(f)  be  a  linear  mixture  of  the  source 
vector  sequence  s{t), 

y(t)  =  Gs{t).  (4) 

Let  us  define  two  different  4th-order  cumulant  matrices  of 
y(f)  as  follows: 

[Ci3j,]y  =  cumi3{yi{t),yj{t)} 

=  cum{yi{t),yj{t),yjit),yj{t)},  (5) 

[C22y]ii  =  cum22{yi{t),yjit)} 

=  cum{yi{t),yt(t),yj{t),yj{t)},  (6) 

where  [-ly  denotes  the  (i,j)th  component  of  the  matrix. 
The  covariance  matrix  Ry  =  E{y{t)y'^  (t)}  is  decomposed 
as 

Ry  =  GG’’.  (7) 

For  the  following  decomposition,  we  introduce  Hadamard 
product  (element-wise  product).  Let  o  denotes  the 
Hadamard  product.  For  example,  the  (i,i)th  element  of 
Go  G  is 5?-. 

Property  1  The  4th-order  cumulant  matrices  Cisy  and 
C22y  have  the  following  decompositions: 

Ci3y  =  GK^GoGoGp,  (8) 

C22y  =  [GoG]K,[GoGf,  (9) 

where  K*  is  the  nonsingular  diagonal  matrix  whose  ith  di¬ 
agonal  element  is  cumi{si{t)}. 

Proof:  It  can  be  proved  by  using  multi-linearity  of  cumu¬ 
lant.  Consider  the  {i,  j)th  element  of  Ci3y .  Time  index  t  is 
omitted  throughout  the  proof,  (j/i  =  yiit),  Si  =  Si{t)) 

cumisiyuyj} 

-  cum(y]  Qji^  si^ ,  ^  gjis  si^ ,  ^  gju  su  } 

ll  I2  ^3  ^4 

=  ^  9ih  ^2  9jl4  cum{si^ ,  S/2  j  } 

li  h  h  U 

=  Yigikg%cumi{sk} 

k 


Similarly,  the  decomposition  (9)  can  be  proved. 

Theorem  1  Let  the  source  signals  s{t)  satisfy  the  assump¬ 
tions  given  in  Al  through  A4.  Suppose  that  the  kurtosis  of 
all  source  signals  s{t)  are  either  positive  or  negative.  Then 
G  has  decomposition  (2),  i.e., 


G  =  PA,  (10) 

if  and  only  if  the  following  conditions  are  satisfied: 

GG^  =  Al,  (11) 

GK,[GoGoG]^  =  A2,  (12) 

[G  o  G]K,[G  o  G]^  =  Aa,  (13) 


where  Ai,  A2,  and  A3  are  nonsingular  diagonal  matrices. 

Proof:  For  the  sake  of  simplicity,  let  us  assume  that 
Al  =  I.  Then,  G  is  an  orthogonal  matrix.  Let  A2  = 
diop{ai,---,Q!„}  and  Kj  =  diop{Ki, •••,«„}•  From 
(12),  the  (i,  y)th  element  of  the  matrix  G,  gij  should  satisfy 
Qij  =  0  or  3?-  =  Using  this  .fact,  the  (i,i)th  element 
(i  j)  of  LHS  in  (13)  is  given  by 

I.-^- 


where  ^  is  a  set  for  the  collection  of  all  possible  cases 
where  G  #  P  I  for  some  permutation  matrix  P  and  the 
diagonal  matrix  I  whose  diagonal  elements  are  either  +1  or 
-1.  Note  that  a*  ^  0  for  Vi  and  YlieK  ^  ^  Thus,  K.  is 

O 

empty  set.  Therefore,  G  =  P  I  ^ 

Note  that  most  of  the  existing  algorithms  satisfy  the  first  two 
conditions  (11),  (12)  so  that  they  could  converge  to  spuri¬ 
ous  equilibria.  For  instance,  it  can  be  easily  shown  that  the 
following  G  satisfies  (11)  and  (12),  but  it  is  not  a  solution 
to  blind  source  separation: 


G  = 


1 


(15) 


3.  Implementation 

We  present  two  different  learning  algorithms  whose 
equilibria  satisfy  the  conditions  given  in  (11),  (12),  and 
(13).  The  network  as  shown  in  Figure  1 ,  consists  of  a  feed¬ 
forward  network  W(f)  and  a  feedback  network  U(<).  The 
feedback  network  U(f)  can  be  a  lower  triangular  matrix 
with  zero  diagonal  elements  (this  feedback  connections  are 
referred  to  be  lateral  feedback  connections)  or  can  be  a  full 
matrix  (a  fully  recurrent  network). 
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Figure  1.  Our  approach  to  blind  source  separa¬ 
tion 


3.1.  The  network  with  lateral  feedback  con¬ 
nections 

We  consider  the  case  where  U(i)  is  a  lower  triangular 
matrix  with  zero  diagonal  elements  as  shown  in  Figure  2. 
The  output  of  the  network  i  =  1,  •  •  •  ,n  is  described 
by 

n 

Viit)  =  Y,  +  Y  Uij(t)yjit),  (16) 

i=l  j<i 

where  Wij{t)  is  the  synaptic  weight  between  the  ith  output 
yi{t)  and  the  jth  input  Xj{t)  and  Uij{t)  is  the  lateral  con¬ 
nection  between  yi{t)  and  yj{t).  Or  in  matrix  form, 

y(f)  =  W(<)x(f)  +  Uy(i),  (17) 


where  A{t)  is  a  diagonal  matrix  whose  ith  diagonal  element 


is  yf{t).  For  sub-Gaussian  source  signals,  sgn{Ks)  =  -1, 
and  for  super-Gaussian  source  signals,  sgn{Ks)  =  -1-1. 
When  the  convergence  of  learning  algorithm  (18)  and  (19) 
is  achieved,  we  have 

E{yi{t)yjit)}  =  0,  (20) 

E{yi{t)ylit)}  =  0,  (21) 

E{yiit)y]it)}  =  1,  (22) 

fori  and 

^{yUt)}  =  1.  for  i  =  1,  •  •  • ,  n.  (23) 


Note  that 

cum22{yi{t),yj{t)}  =  E{y^{t)y]{t)}-E{yi{t)yj{t)} 

-2E{y^it)}E{y]{t)}.  (24) 

It  can  be  easily  seen  that  when  these  conditions  (20),  (21), 
(22),  and  (23)  are  satisfied,  all  2nd-  and  4th-order  cross- 
cumulants  of  y{t)  become  zero.  By  Theorem  1,  these  equi¬ 
libria  are  desirable  solutions. 

3.2.  The  network  with  full  feedback  connec¬ 
tions 


where  W(f)  =  [wij{t)]nxn  is  a  full  matrix  and  U  = 
[uy(i)]„xn  is  a  lower  triangular  matrix  with  Wy(f)  =  0 
for  i  <  j. 


Figure  2.  The  structure  of  the  feedforward  net¬ 
work  with  lateral  feedback  connections 


For  such  a  neural  network,  we  have  developed  the  fol¬ 
lowing  adaptive  learning  algorithm: 

=  riy,(t){al  -  ay{t)y'^{t) 

-sgn{Ks)A{i) 

+sgn{Ks)[yit)  o  y(f)  o  y(f)]y^(t)}W(t),  (18) 

fori>i,  (19) 


We  consider  the  case  where  the  feedback  connection  ma¬ 
trix  U(t)  is  a  full  matrix  (a  fully  recurrent  network  includ¬ 
ing  the  self-inhibitions  connections  from  each  output  node 
back  to  itself).  The  output  of  the  network  y{t)  is  still  given 
by 

y(f)  =  W(t)x(f)  -I-  U(f)y(t).  (25) 

The  feedforward  connections  W(f)  is  trained  to  force  all 
4th-order  cross-cumulants  of  the  output  y{t)  to  vanish  to 
zero.  The  decorrelation  learning  algorithm  with  the  feed¬ 
back  connections  U(f)  was  motivated  from  the  natural 
gradient-based  algorithm  [1].  We  have  developed  the  fol¬ 
lowing  learning  algorithm 

=  Vw{t){T{t)  -  [y{t)  o  y(f)][y(f)  o  y(f)]^ 

-spn(K,)y(f)[y(f)  oy(f)  oy(f)]^ 
+S9”(«»)[y(f)  oy(f)  oy(f)]y^(f)}W(f),  (26) 

=  7?„(f){I  -  U(f)}{I  -  y{t)y^it)},  (27) 

where  the  tth  diagonal  elements  of  r(f)  is  yj{t)  and  all  off- 
diagonal  elements  of  T{t)  are  unity.  It  can  be  shown  that 
stable  equilibria  of  (26)  and  (27)  satisfy  the  conditions  (11), 
(12),  and  (13). 
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4.  Discussion 


We  will  discuss  the  extension  of  Theorem  1  to  the  case 
where  sub-Gaussian  (negative  kurtosis)  and  super-Gaussi^ 
(positive  kurtosis)  source  signals  are  mixed.  Even  in  this 
case,  Theorem  1  still  holds  if  any  possible  sum  of  the  in¬ 
verse  of  kurtosis  of  more  than  two  source  signals,  is  not 
zero,  i.e.,  ^ ^  combination 

of  indices  greater  than  or  equal  to  two,  out  of  (1, 2,  •  •  • ,  n). 
This  condition  guarantees  that  all  diagonal  elements  of  A3 
are  non-zero.  However,  in  situation  where  x(f)  contains  the 
mixtures  of  both  sub-Gaussian  and  super-Gaussian  source 
signals,  there  is  a  possibility  that  the  sum  of  the  inverse 
of  the  kurtosis  might  be  zero.  Moreover,  the  learning  al¬ 
gorithms  become  unstable.  Thus  additional  techniques  are 
required  [3]. 

The  learning  algorithm  (18)  and  (19)  associated  with  the 
first  proposed  network  is  working  well  for  sub-Gaussian 
sources  and  super-Gaussian  sources  (not  for  mixtures  of 
sub-  and  super-Gaussian).  For  the  stability  of  learning  algo¬ 
rithm,  we  introduce  the  function  sgn{Ks).  At  this  moment, 
we  do  not  have  theoretical  result  for  stability  analysis. 

The  neural  network  architecture  shown  in  Figure  1  and 
2  can  be  used  for  extraction  of  source  signals  one  by  one. 
Maximization  or  minimization  of  the  normalized  kurtosis 
with  deflation  approach  is  found  in  [9,  10].  We  are  de¬ 
veloping  the  learning  algorithm  for  sequential  extraction  of 
source  in  the  hierarchical  network  as  shown  in  2. 

5.  Computer  Simulations 

The  computer  simulations  are  conducted  to  evaluate  the 
performance  of  the  proposed  algorithm  (18),  (19)  and  (26), 
(27).  The  global  system  G  =  (I  -I-  U)"^  WA  is  evaluated 
to  check  the  efficiency  of  the  algorithm.  The  global  system 
G  approach  the  generalized  permutation  when  the  learning 
algorithm  converges  to  desirable  solutions. 

5.1.  Computer  Simulation  1 

Three  i.i.d.  binary  sources  drawn  from  uniform  distribu¬ 
tion  (negative  kurtosis)  are  used.  The  mixing  matrix  A  is 
chosen  randomly  as 

■  0.1549  0.1405  0.3916  ‘ 

A=  0.5258  0.2041  0.9370  (28) 

0.2047  0.5108  0.4310  _ 

All  elements  of  G(f)  =  (1  +  U(f))“^W(f)A  are  plotted 

as  shown  in  Figure  3  and  4. 


Figure  3.  The  modulus  of  each  element  of 
the  global  system  G{t)  when  the  first  pro¬ 
posed  network  with  the  learning  algorithm  (18), 
(19).  The  learning  rates  were  set  T)w{t)  =  .003, 
=  .0007. 


Figure  4.  The  modulus  of  each  element  of 
the  global  system  G(f)  when  the  second  pro¬ 
posed  network  with  the  learning  algorithm  (26), 
(27).  The  learning  rates  were  set  T}w{t)  =  -003, 
r}u{t)  =  .009. 
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Figure  5.  Sampling  rate  was  lOkHz.  Each  plot 
shows  the  signal  for  the  duration  of  .1  second. 
The  original  source  signals  are  shown  in  the  first 
row.  Two  sensor  outputs  are  shown  in  the  sec¬ 
ond  row.  The  recovered  signals  are  shown  in 
the  third  row  by  using  the  learning  algorithm 
(18)  and  (19).  The  similar  result  can  be  ob¬ 
tained  by  using  (26)  and  (27). 


5.2.  Computer  Simulation  2 

Two  independent  source  signals  having  positive  kurtosis 
were  used  to  generate  the  observation  data  x{t)  by  using 
randomly  generated  mixing  matrix  A.  Two  different  source 
signals  are  given  by 

Si{t)  =  siii7{uxt)j 

S2{t)  =  sin^{uj2t).  (29) 

Figure  5  shows  original  source  signals  s(i),  the  sensor  out¬ 
puts  x(i),  and  the  recovered  signals  s(^). 

6.  Conclusion 

In  this  paper,  we  presented  a  new  necessary  and  suffi¬ 
cient  condition  for  blind  source  separation.  It  was  shown 
that  if  all  2nd-  and  4th-order  cross-cumulants  of  the  output 
y{i)  become  zero,  then  the  source  signals  can  be  recovered. 
It  was  proved  by  algebraic  properties  of  2nd-  and  4th-order 
cumulants  for  the  case  of  n  sources.  Based  on  this  criterion, 
we  constructed  a  linear  feedforward  network  followed  by  a 
feedback  network.  Two  different  learning  algorithms  were 
presented  and  confirmed  by  the  computer  simulations. 
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Abstract 

In  this  paper  we  develop  an  algorithm  to  improve  the 
accuracy  of  the  estimation  of  the  direction  of  arrival  of 
the  wide-band  sources.  It  is  well  known  that  when  the 
noise  cross-spectral  matrix  is  unknown,  these  estimates 
may  be  grossly  inaccurate.  Using  both  the  fourth  order 
cumulant  for  suppression  of  the  Gaussian  noise,  the 
transformation  matrices  for  estimating  the  coherent 
signal  subspace  and  a  noneigenvector  algorithm  a 
robust  method  for  the  source  characterisation  problem 
in  the  presence  of  noise  with  an  unknown  cross- 
spectral  matrix  is  developed.  We  shall  show  that  the 
performance  of  bearing  estimation  algorithms 
improves  substantially  when  our  robust  algorithm  is 
used.  Simulation  results  are  presented  for  the  unknown 
noise  spectral  matrix. 


1.  Introduction 

The  estimation  of  directions  of  arrival  of 
mxiltiple  narrow-band  or  wideband  sources  in  the 
noise  is  a  classic  problem  in  array  signal 
processing!  1-6],  Many  bearing  estimation 
procedures  have  been  reported  in  the  literature, 
among  which  the  various  eigenanalysis-based 
methods  have  been  the  focus  of  many  studies[l-5]. 
The  eigenstructure  procedure  is  computationally 
costly  when  the  number  of  sensors  is  large.  These 
methods  are  also  dependent  on  the  structure  of  the 
noise  cross-spectral  matrix.  A  fundamental 
assxunption  for  most  direction  finding  algorithms, 
developed  in  the  last  decade,  is  that  the  noise  is 
spatially  and  temporally  white  or  the  spatial 
correlation  structure  of  the  background  noise  is 


known  to  within  a  multiplicative  scalar.  Then,  the 
localisation  algorithm  can  be  usually  modified  in  a 
straightforward  manner  to  include  it  in  the 
treatment.  In  practice,  this  assumption  is  rarely 
fulfilled.  This  is  due  to  the  fact  that  the  noise  has 
several  origins,  such  as  traffic  noise,  ambient  sea 
noise,  or  flow  noise,  and  sometimes  the  source 
signals  with  low  power  or  undetected  are 
assimilated  to  the  noise,  which  are  often  spatially 
correlated.  In  recent  years,  there  has  been  a  growing 
interest  in  the  problem  of  improving  high  resolution 
eigenstructure  techniques  with  objective  of  lowering 
the  signal  to  noise  ratio  resolution  threshold  or  the 
spatially  colored  noise[7-13].  The  ambient  noise  is 
vmknown  in  practice,  therefore  its  modelisation  or 
its  estimation  is  necessary.  The  methods  developed 
for  this  problem  are  very  few  and  there  is  not  a 
definitive  solution  to  this  problem.  There  are  some 
practical  methods  :  in[l  1]  two  methods  are  obtained 
by  optimisation  of  criterion  and  by  using  AR  or 
ARMA  modelling  of  noise.  In[12-13]  the  spatial 
correlation  matrix  of  noise  is  modelled  by  the 
known  Bessel  functions.  As  in[8]  the  ambient  noise 
spectral  matrix  is  modelled  by  a  sum  of  hermitian 
matrices  known  up  to  multiplicative  scalar.  In[14] 
this  estimate  is  obtained  by  measuring  the  array 
cross-spectral  matrix  when  no  signals  are  present. 
This  procedure  assume  that  the  noise  is  not  varying 
with  time,  which  is  not  fulfilled  in  several  domain 
applications.  Anofiier  possibility!  15]  arises  when 
the  correlation  structure  is  known  to  be  invariant 
under  a  translation  or  rotation.  The  so-called 
covariance  differencing  technique  can  be  then 
applied  to  reduce  the  noise  influence.  In  this 
method,  two  identical  translated  and/or  rotated 
measurements  of  the  array  cross-spectral  matrix  are 
required  and  h5q)othesises  the  invariance  of  the 
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noise  cross-spectral  matrix,  while  the  source  signak 
change  between  the  two  measurements.  The 
estimate  of  the  noise  cross-spectral  matrix  is 
eliminated  by  simple  subtraction.  Furthermore,  this 
method  cannot  be  applied  when  the  source  cross- 
spectral  matrix  satisfies  the  same  invariance 
property  or  when  only  one  measurement  is 
available.  In[8-9]  a  particular  modelling  structure 
noise  spectral  matrix,  which  takes  into  accoimt  the 
characteristic  noise  relative  to  its  origins,  is  given. 
In  general,  even  if  the  individual  treatments  is 
different  in  these  articles,  the  obtained  noise 
structure  matrix  is  the  same. 

In  this  paper,  the  direction-of-arrivals  of  the 
sources  are  estimated  in  the  presence  of  Gaussian 
noise  with  unknown  spectral  matrix.  By  means  of 
the  property  of  the  fourth-order  cumulant,  the  noise 
contribution  is  suppressed.  Thus  we  use  the 
coherent  propagator  method  with  spatial  fourth 
order  cumulant  matrices.  Instead  of  spectral 
matrices  which  are  used  by  the  coherent  signal 
subspace  algorithm,  the  focusing  matrices  are  used 
in  order  to  transform  the  propagator  of  the  fourth 
order  cumulant  matrices  at  each  analysis  frequency 
bin  into  the  propagator  at  the  selected  frequency. 


2.  Problem  formulation 


Consider  an  uniform  linear  array  composed  of  N 
identical  sensors  separated  from  each  other  by  a 
distance  d.  Let  P,  (P<N),  soiurces  impinge  on  the 
array  from  the  directions  }.  The 

signal  received  at  the  ith  sensor  can  be  expressed 
as  ; 

P 

rj  it)  =  2  sp it -r^p)  +« . (0,  i  = 

Jr 

-T/2<t<T/2  (1) 

where  ni(t)  is  the  additive  noise  at  the  ith  sensor, 
Sp(t)  is  the  signal  emitted  by  the  pth  source  and 

is  the  propagation  delay  associated  with  the  pth 
source  and  the  ith  sensor.  Rewriting  (1)  in  matrix 
notation,  in  the  frequency  domain,  we  obtain  : 


(2) 

where  r(]g),  s(5)  and  n(§)  are  the  Fourier 
transforms  of  the  observation,  signal  and  noise 
vectors,  and  A(^)  is  the  NxP  transfer  matrix  of  the 
source-sensor  array  systems  with  respect  to  some 
chosen  reference  point,  A(^  )  =  [a(^  ,01),  a(:^ 

’®2), . ,  a(fj  ,0p)  ],  a(:^,0i)  is  the  steering  vector  of 

the  array  toward  the  direction,  0;  at  the  frequency 
It  is  assumed  that  A(^  )  is  full  rank.  In  other 
words,  for  each  §  the  steering  vectors  a(^  ,0i) 
j=l,....,M  and  i=l,...,P  are  linearly  independent.  For 
a  large  T  samples  of  the  observation  vector  at 
different  frequency  bins  are  uncorrelated. 

Assume  that  the  signals  and  the  additive  noises  are 
uncorrelated,  and  the  noises  are  assumed  to  be 
Gaussians.  It  follows  from  these  assumptions  fhaf 
the  spatial  cross-spectral  matrix  of  the  observation 
vector  at  frequency  ^  is  given  by  ; 


using  the  above  assumptions,  we  obtain : 

) = A  (/^  )r,  (/,. )  A+  (fj ) + r„  Uj ) 

.  (3) 

where  the  superscript  +  represents  the  heimitian 
transpose. 

The  universal  spectral  matrix  in  the  coherent 
subspace  method  can  be  shown  as  : 


(4) 


where  T(^)  is  the  j-th  focusing  matrix,  such  that 
T(^)A(^)  =  A(fc),  where  is  the  selected  focusing 
frequency.  In[16]  an  unitary  version  of  the  coherent 
subspace  method  is  introduced  which  is  based  on 
choosing  T(^)  to 


min  ||A(/c)-T(/^.)A(/y) 


|2 


s.t.r0T(fi)=i 


j  =  1, . ,  M 

(5) 


where  A(fc)  is  the  focusing  location  matrix  of  the 
array  and  ||.||is  the  Frobenius  norm  matrix.  The 
matrix  T(^)  solution  of  (5)  is  the  focusing  matrix  of 
the  unitary  coherent  subspace  method. 
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The  focusing  matrices  transform  the  signal 
subspace  at  the  j-th  ffecpiency  bin  into  the  focusing 
signal  subspace  and  it  has  been  shown  that  the 
imitary  coherent  subspace  method  does  not  create 
focusing  loss  [16],  The  focusing  loss  is  defined  as 
the  ratio  of  file  signal-to-noise  ratio  after  focusing 
to  the  signal  to  noise  ratio  before  focusing.  To 
determine  the  focusing  matrix  T(^)  from  (5),  it  is 
assumed  that  the  matrices  A(fj)  and  A(fc)  are 
known.  In  practice,  an  ordinary  beamformer  is  used 
to  estimate  the  initial  azimuths  of  the  sources. 
These  angles  are,  then,  used  as  the  focusing  angles. 
The  spatial  fourth  order  cumulant  matrix  at  j-th 
temporal  frequency  bin  is  defined  as 


H(/y)  =  C«m 


V/,) 

...r^  •(/,.)]} 


where  "Cum”  is  the  abbreviation  for  cumulant  and  * 
denotes  complex  conjugate.  Estimate  the  following 
fourth-order  cumulant  matrix: 

Substituting  (1)  into  file  above  expression  and  taken 
account  the  noise  is  Gaussian,  after  some 
calculations [17- 18]  we  obtain 


of  the  sources  signals.  In  equation  (6)  the 
suppression  of  fiie  noise  spectral  matrix  is  due  to 
the  fact  the  noise  is  assumed  to  be  Gaussian  then 
their  fourth  order  cumulants  are  zero.  In  practice 
file  fourth-order  cumulant  matrix  )  is 

estimated  from  the  set  of  received  vectors  (f j  ) , 

j  =1, . ,M  andk=l,..,  K. 


3.  Propagator  method 


Using  the  assumption,  that  the  matrix  A(^)  is  of 
rank  P,  there  exists  a  (N-P)  x  P  matrix  such  that : 
A2(/^)  =  n"(/^.)  Ai(/,) ,  for  j=l,...,M  (7) 

where  Ai(/^  )and  A  2(/^  )  are  two  block  matrices 

of  file  transfert  matrix  A(^),  of  dimensions  (PxP) 
and  (N-P)xP  respectively.  The  matrix  n(/y)  is  the 

propagator  operator.  For  estimating  n(/^),  the 

fourth-order  cumulant  matrix(6)  is  used.  Using  (7), 
equation  (6)  can  be  written[  19-20]  : 


K(fjy- 

Kifj)- 


K(fj)  H-(/,) 

A,(/,)H,(/,)A,^(/,) 

n^(/,)H;;(/,)n(/,). 


,  where 


(8) 

(9) 

(10) 

(11) 


H„(/,)=i;/».(/;)|Aj(l,0fa,(/;)a/(/;) 


For  example,  one  can  estimate  file  propagator  by  : 

n(/,)=(H',:(/,))‘'H;;(/,).  (12) 


where  H^(/j )  is  the  diagonal  matrix 

|Aj(l,2)pA^(/^),....,|Aj(l,/>)Pv(//)l 

with 


The  obtained  propagator  is  used  to  calculate  the 
localisation  function,  given  by  : 


1 


ue)= — 

where  Q*(/y)=[n*(/,)  |-l]. 


-,  for 


7t 

2’2 


(13) 
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In  order  to  exploit  the  advantage  provided  by 
the  fourth-order  cumulant  domain  and  the  coherent 
treatment,  the  transformation  matrices  are  applied 
to  the  propagator  operator  at  each  frequency  bin 

T(/;)n(/,)=na)  for  j  where 

na) = [a,  a,*  (/,)]■'  H'=  (J.^  ond 

1  ^ 

H.(/.)=^ZAr'(/,)tf;(/,)(A,^(/,))  , 

Ai  (/, )  is  obtained  by  using  an  initial  estimates  of 

the  direction  of  arrival  of  the  sources.  The 
transformation  matrix  is  given  by  : 

The  coherent  propagator  is  then  : 

1  M 

Finally,  n(/,)  may  be  used  in  (13)  for  the 
estimation  of  the  direction-of-arrivals  of  the  wide¬ 
band  sources. 

4.  Numerical  example 

We  consider  a  linear  array  of  N=14  sensors 
uniformly  spaced  at  half  wavelength  of  a  central 
temporal  frequency.  The  source  signals  are 
temporally  stationary  zero-mean,  non-Gaussian 
wide-band.  Four  sources  impinge  on  the  array  0i  = 
8°,  02  =  10°,  03  =  30°  and  04  =  32°  respectively,  in 
the  presence  of  the  colored  and  Gaussian  noise.  The 
array  noise  is  stationary  zero-mean,  independent  of 
the  signals.  The  signal  to  noise  ratio  is  10  dB.  From 
the  array  outputs,  256  snapshots  of  64  samples 
each  were  selected  and  the  frequency  components 
were  obtained  via  FFT.  The  directions  9°  and  31° 
given  by  die  beamfoimer  method  are  used  to 
estimate  the  focusing  matrices.  The  MUSIC  spectra 
of  spatial  spectral  matrix  and  cumulant  matrix 
based  on  the  coherent  propagator  method  are 
plotted  in  Figures  1  and  2  respectively.  From  Figure 
2  it  is  obvious  that  the  four  sources  are  localized  by 


using  the  cumulant  matrices  while  diey  are  not 
resolved  when  the  spectral  matrices  are  used. 

One  can  remark  the  Music  method  can  not  separate 
the  four  sources  even  the  number  of  the  sources  is 
taken  equal  to  4,  but  die  proposed  mediod  gives  the 
exact  azimuth  of  die  sources. 
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Figurel  :  Spectral  Matrix  and  Coherent  Signal 
Subspace  Method. 
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Figure  2  :  Fourth-Order  Cumulant  and  Coherent 
propagator  Method. 

5.  Conclusion 

In  this  study,  we  have  developed  a  coherent 
propagator  method  based  on  the  fourth-order 
cumulant  for  locating  the  wide-band  sources  from 
die  received  data  in  die  presence  of  unknown  noise 


fields.  The  simulation  results  show  that  the 
proposed  algorithm  has  asymptotically  a 
performance  similar  to  (i.e.,  exact  and  unbiased)  the 
standard  eigenstructure  algorithm  applicable  in 
known  noise  fields. 
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Abstract 

The  computational  aspects  related  to  sample  estimation 
of  moments  involving  certain  ** piecewise**  nonlinearities 
are  addressed  with  application  to  DOA  estimation.  In  par¬ 
ticular,  the  accuracy  v^.  computational  saving  tradeoff  as¬ 
sociated  to  ** soft-limiting**  nonlinearities  can  be  exploited 
to  simplify  the  computation  of  sample  covariances  without 
resulting  in  significative  accuracy  loss.  It  is  also  shown 
how,  in  sample  cumulants  evaluation,  this  approach  can  be 
employed  to  reduce  the  overall  number  of  aritmethic  op¬ 
erations  using  nonlinearities  which  act  separately  on  the 
real  and  the  imaginary  parts  of  complex  numbers. 


I  Introduction 

High  resolution  Direction  of  Arrival  (DOA)  estima¬ 
tion  such  as  MUSIC,  ESPRIT,  etc.,  exploit  the  geometric 
structure  of  the  covariance  matrix  of  the  signal  measured 
through  an  array  of  sensors.  While  some  attention  has  been 
devoted  in  literature  regarding  the  computational  aspects  of 
the  underlying  eigen-decomposition  of  the  covariance  ma¬ 
trix  {e.g.  [1])  or  concerning  the  proposition  of  alternative 
methods  avoiding  such  eigen-decomposition  {e.g.  [2]),  the 
reduction  of  the  complexity  associated  to  the  computation 
of  the  covariance  has  not  received  equal  attention. 

In  [3,  4],  nonlinear  statistics  have  been  proposed  for 
DOA  estimation  in  the  framework  of  the  so-calM  Hybrid 
Nonlinear  Moments  (HNL);  particular statistics  based 
on  the  extraction  of  the  signum  of  real  and  imaginary  parts 
of  the  complex  valued  signal  samples  have  been  described, 
extending  techniques  for  estimation  of  covariance  of  Gaus¬ 
sian  processes  based  on  Bussgang  theorem  [5].  An  ex¬ 
tension  of  Bussgang  theorem  is  found  in  [6]  for  complex 
processes  while  [7]  address  the  multivariate  real  case.  In 
[8],  a  similar  technique  has  been  adopted  for  estimation 
of  time-frequency  distribution  of  polynomial  phase  signal 
and  its  accuracy  has  been  thoretically  evaluated.  Recently, 
a  different  dithering/hard-clipping  technique  has  been  ex- 
tendend  to  computation  of  higher-order  moments  in  [9], 

The  main  drawback  of  signum  based  nonlinearities  is 


related  to  the  loss  of  accuracy  introduced  by  the  hard- 
limiters. 

In  this  contribution,  we  introduce  piecewise  nonlineari¬ 
ties  aimed  at  saving  computations  in  the  numerical  evalua¬ 
tion,  while  preserving  the  desired  degree  of  accuracy  with¬ 
out  increasing  the  size  of  the  sample.  The  nonlinearities 
herewith  addressed  have  the  following  form 


for  \x\  <  a 
for  |rr|  >  a 


(1) 


where  the  parameter  a  controls  the  choice  among  the  non- 
linearities  /i(*)  and  /2(-)  which.act  on  medium-low  signal 
levels  and  medium-high  signal  levels,  respectively. 

The  computational  savings  obtainable  from  nonlineari¬ 
ties  (1)  are  related  to  particular  forms  assumed  by  /i(-)  or 
/2(‘)»  For  instance,  the  parameter  a  controls  the  tradeoff 
between  covariance  (accuracy)  and  signum  (fast  computa¬ 
tion)  when  fi{x)  =  x  and  f2(x)  =  sign(a:). 


II  DOA  Estimation  using 
Nonlinear  Statistics 

We  are  concerned  with  the  following  linear  model  of 
observations  drawn  from  a  M  sensors  uniform  linear  array 
(ULA) 


X  =  A(a;)  •  s  -h  w  (2) 

where  s  is  a  L  vector  of  non  Gaussian,  independent  sources, 
w  is  a  M  vector  of  (possibly  coloured)  Gaussian  noises, 
independent  of  the  sources,  the  DOA’s  are  collected  in  the 
vector  tj  [a;i  •  •  •  and 

A(a>)  [a(wi)  •••  a(wi,)] 

is  the  matrix  of  DOA’s  related  steering  vectors 

SL{aJi)  =  [l 

Modern  high-resolution  methods  of  DOA  estimation  rely 
on  the  subspace  decomposition  of  the  covariance  matrix 

Rx  =^E{x.x«}  =  A-IU.A«  +  Rw  (3) 
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where  Rs  and  Rw  are  the  covariance  matrices  of  the 
sources  and  the  noises,  respectively,  and  the  dependence 
on  the  DOA  vector  w  has  been  dropped  out  for  readibility 
purposes.  If  the  number  of  sources  L  is  less  than  the  num¬ 
ber  of  sensors  M,  suitable  subspace  decomposition  of  Rx 
constitutes  the  basis  for  DOA  estimation.  In  [3,  4],  an  ex¬ 
tension  of  this  approach  to  nonlinear  statistics  is  described 
and  it  is  summarized  below. 

Let  g{x,x)  a  complex  nonlinearity,  having  denoted 
complex  conjugation  by  an  overbar  ;  taking  the  cross- 
correlations  between  x  =  \x\  ■  •  ■  Xm\  and  its  nonlin¬ 
ear  version  p(x)  [p(a:i,xi)  •  •  we  obtain 

a  class  of  matrices  having  the  same  structure  of  the  co- 
variance  matrix  (3).  Thus,  subspace  decomposition  based 
methods  can  be  still  applied.  In  fact,  we  have  [3,  4] 

Gx  =^E{x-5(x)«}  =  A-Hx-A“  +  fcs-Rw  (4) 

where  Hx  is  a  matrix  depending  on  the  pdf  of  the  sources 
and  of  the  noises  and 

(5) 


computation  of  the  sample  statistics  reduce  to  a  sign  bit 
adjustement.  An  analytical  perfomance  evaluation  is  cur¬ 
rently  under  development  for  the  Gaussian  case,  following 
the  indication  of  [13].  Here,  we  will  present  some  simula¬ 
tion  results  concerning  the  accuracy/saving  tradeoff. 

Other  computational  savings  are  possible  when  the 
sources  are  non-Gaussian  distributed.  In  this  case,  DOA 
estimation  based  on  higher-order  cumulants  offers  some 
advantages  concerning  the  blind  rejection  of  coloured 
Gaussian  unwanted  components  or  the  detection  of  phase 
coupling  between  the  harmonics  [10,  11,  12]. 

Nonlinear  statistics  possessing  similar  properties  to  cu¬ 
mulants  can  be  obtained  by  subtracting  an  adequate  quan¬ 
tity  of  covariance  (second-order  cumulant)  from  a  general 
nonlinear  moment.  In  fact,  the  expectation 

E{x  -7(1/,^)}  E{x  •  3(y,y)}  -  fcp  •  E{x  •  2/}  (8) 

does  not  contain  any  second-order  (covariance)  contribu¬ 
tion  when  the  constant  kg  is  chosen  as  indicated  by  Buss- 
gang  theorem  [3,  4],  namely 

=  (9) 


is  the  (Bussgang)  proportionality  factor  relating  nonlinear 
cross-correlations  of  Gaussian  processes.^ 

In  particular,  we  investigate  on  the  use  of  a  complex 
limiter,  ^Le. 


g{xr,  Xi)  =  lmt(2;r)  +  j\mt{xi)  (6) 

where  the  real  nonlinearity  lmt(r;)  is  defined  as  follows: 


V 

a  •  sign(i;) 


for  |v|  <  a 
for  |t;l  >  a 


(7) 


where  the  parameter  a  trades  off  between  accuracy  and 
computational  savings.  The  saving  is  obtained  since,  for 
\xr\  >  oi  or  \xi\  >  a  the  multiplication  needed  in  the 

^The  partial  differentiation  in  (5)  is  defined  in  terms  of  differentiations 
w.r.t.  the  real  and  imaginary  parts  of  cc  =  Xr  +  jxi  as  follows 


Analogously, 


cfef  1  /  _  .  0  \ 

dx  2  \dxr  ^  dxi) 

dx  2\dXr  dXiJ 


For  analytic  functions,  the  former  defines  the  complex  differention  rule 
due  to  Cauchy-Riemann  conditions,  while  the  latter  vanishes  for  the  sanie 
reason.  The  nonlinearies  9{x,x)  herewith  considered  are  non  analytic 
functions  w.r.t.  the  complex  variable  x;  rather  they  should  be  considered 
as  complex  valued  functions  of  the  two  real  variables  {xn  Xi).  Regarding 
a:  and  ic  as  independent  variables,  they  can  be  represented  by  biargumental 
analytic  functions  (see  [6]).  _ 

^With  a  little  abuse  of  notation,  we  use  the  same  notation,  e.g.  gs  (v,  y) 
and  gsiyriVi)*  to  represent  a  complex  nonlinearity  either  when  the  inde¬ 
pendent  variables  are  {x,x)  or  {xr,xi) 


As  an  utilization  of  (8),  we  will  show  how  power  nonlin¬ 
earities  acting  separately  on  the  real  and  the  imaginary  parts 
may  offer  some  computational  savings.  For  instance,  let  us 
consider  the  following  fourth-order  slice  cross-cumulant 
between  the  circularly  complex  variates  x  and  y  with  vari¬ 
ances  al  and  respectively: 

cum  {x,y,y,y)^E{x‘y-y‘y}  —  2-ay’E{x-y}  (10) 

and  consider  also  the  following  nonlinearity  acting  sepa- 

dcf  • 

rately  on  the  real  and  the  imaginary  part  of  y  =  yr+  JVi 

9:(yr,y,)  =  vi  +  ii4  (") 

The  following  nonlinear  moment 

E{x-%{y,y)}=E{x  ■g3{y,y)}-kg^ 

still  posseses  the  cumulants  property  of  rejecting  Gaussian 
additive  random  variables  since 


Moreover,  mimicking  the  limiter  behaviour,  a  computa¬ 
tional  parsimonious  nonlinearity,  acting  separately  on  the 
real  and  imaginary  parts,  is 

fziy,y)  =  fiyr) + jfiyi) 
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where 

V  for  |?;|  <  a 

f  11^ 

V  '  tor  m  >  a 

The  Bussgang  constant 

associated  to  the  nonlinearity  (15)  needs  the  first  derivative 

■^My^y)  =  \  {fiyr-)  +  fiyi)) 

where 

1  for  |t;|  <  a 

3  .  _  for  \v\  >  a 

The  savings  offered  by  the  nonlinear  expectation 
E  {x-f^{y^y) }—kf^-E  {x-y}  consist  in  avoiding  the  square 
and  two  real  multiplications  involved  in  (12)  whenever  it 
results  \yr\<a  or  \yi\<a  in  sample  estimation.  This  con¬ 
stitutes  a  50%  saving  w.r.t.  extra  computations  needed  to 
cope  with  the  nonlinearity  gsi'r)  (12). 

Ill  Simulation  Results  and  Conclusion 

Computer  simulations  have  been  performed  to  support 
the  feasibility  of  the  above  described  approaches. 

The  considered  scenario  consists  of  two  equiampli- 
tude  waves,  having  independent  and  uniformly  distributed 
phases,  impinging  on  an  array  of  8  half-wavelenght  spaced 
sensors.  First,  the  tradeoff  between  accuracy  and  compu¬ 
tation  of  nonlinear  moments  involving  the  complex  limiter 
(LMT)  of  (6)  has  been  addressed  in  the  case  of  white  ad¬ 
ditive  Gausian  noises  for  different  SNR  values.  To  test 
the  accuracy  of  the  limiter  based  estimation,  the  angular 
difference  between  the  DOA’s  is  diminished  from  10°  in 
Fig.l  (DOA’s  15°  and  25°)  to  5°  in  Fig.2  (DOA’s  15° 
and  20°).  In  the  same  figures,  the  MSB  relative  to  the 
DOA  15°  is  reported  also  for  the  covariance  based  estima¬ 
tion  (curves  labelled  “COV”)  and  the  so-called  ‘‘complex 
hybrid  signum”  based  estimation  (curves  labelled  “CHS”); 
this  latter  is  obtained  for  a  =  0  in  (6).  All  the  estimations 
are  drawn  from  Root-MUSIC  applied  to  sample  statistics 
of  size  N  =  100  independent  snapshots.  Averages  over 
100  MonteCarlo  runs  have  been  performed.  The  value  of 
the  parameter  a  has  been  chosen  equal  to  half  the  standard 
deviation  of  the  signal  received  at  the  generic  sensor:  we 
have  observed  that  this  roughly  corresponds  to  a  25%  sav¬ 
ings  of  real  multiplications  and  additions.  We  see  that,  for 
DOA  spacing  of  5°,  the  complex  signum  (CHS)  based  es¬ 
timation  presents  a  sensible  accuracy  loss  while  the  limiter 


based  estimation  (LMT)  shows  the  expected  intermediate 
behavior  between  covariance  and  signum  based  estimates. 


Figure  1:  Two  sources  from  15°  and  25"^  in  white 
noises:  MSB  of  Root-MUSIC  estimates  of  DOA  15° 
from  100  snapshots  and  100  MonteCarlo  runs. 


Figure  2:  Two  sources  from  15°  and  20^  in  white 
noises:  MSB  of  Root-MUSIC  estimates  of  DOA  15° 
from  100  snapshots  and  100  MonteCarlo  runs. 


In  Figs.3  and  4,  the  case  of  highly  correlated  noises,  with 
correlation  coefficient  very  close  to  one,  has  been  consid¬ 
ered.  Here,  estimation  drawn  from  the  nonlinearity  /3(-,  •) 
in  (14)  is  considered  along  with  that  relative  to  fourth-order 
slice  cumulants  defined  in  (10),  and  the  mean  squared  es¬ 
timation  errors  of  DOA  15°  and  of  DOA  25°  are  plotted  in 
Fig.3  and  Fig.4,  respectively,  for  a  sample  size  of  AT  =  256 
independent  snapshots.  The  value  of  a  is  chosen  as  already 
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Figure  3:  Two  sources  from  15®  and  25®  in  highly 
correlated  noises:  MSE  ofRoot-MUSIC  estimates  of 
DOA  15^  from  256  snapshots  and  100  MonteCarlo 
runs. 


done  in  Figs.l  and  2  and,  also  in  this  case,  it  roughly  corre¬ 
sponds  to  a  25%  savings  of  real  multiplications  and  sums. 
We  see  that  the  accuracy  of  the  estimation  based  on  (14), 
(curves  labelled  by  “Cube”)  is  even  slightly  better  than  that 
obtained  from  fourth-order  cumulants  (curves  labelled  by 
“Cum4’),  which  is  far  from  being  an  “optimal”  statistics. 
In  conclusion,  the  use  of  nonlinear  statistics  in  DOA  esti¬ 
mation  can  save  computations  without  affecting  the  overall 
accuracy.  A  performance  analysis  of  the  complex  limiter 

(6)  bas^  estimation  is  currently  under  investigation  in  the 
reference  case  of  Gaussian  sources  in  Gaussian  noise. 
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ABSTRACT 

Over  the  last  decade,  higher  order  (HO)  methods  have 
been  strongly  developed  in  particular  to  blindly  separate 
instantaneous  mixtures  of  statistically  independent 
stationary  sources.  However,  in  many  situations  of 
practical  interest,  the  received  sources  are  (quasi)- 
cyclostationary  (digital  radiocommunications)  and  are  not 
always  statistically  independent  but  may  be  correlated  to 
each  other,  which  occurs  in  particular  for  HF  links  or  in 
mobile  radiocommunications  contexts  where  propagation 
multipaths  are  omnipresent.  In  such  situations,  the 
behaviour  of  the  classical  HO  blind  source  separation 
methods  is  not  known,  which  may  be  a  limitation  to  the 
use  of  these  methods  in  operational  contexts.  The  purpose 
of  this  paper  is  precisely  to  fill  the  gap  previously 
mentionned  by  analysing  the  behaviour,  in 
radiocommunications  contexts,  of  three  classical  HO  blind 
source  separation  methods  when  several  potentially 
correlated  paths  of  each  source,  assumed  (quasi)- 
cyclostationary,  are  received  by  the  array. 

1.  INTRODUCTION 

Over  the  last  decade,  higher  order  (HO)  methods  have 
been  strongly  developed  in  particular  to  blindly  separate 
instantaneous  mixtures  of  statistically  independent  and 
stationary  sources  [1-5].  In  [6-7],  the  performance  of  two  of 
these  methods,  corresponding  to  the  so-called  JADE 
method  [2]  and  to  the  one  which  optimizes  a  constrast 
function  squaring  the  samples  fourth-order  autocumulants 
[3],  have  been  analysed  for  arbitrary  statistically 
independent  and  stationary  sources  scenari.  In  a  same  way, 
the  performance  of  a  third  method  [5],  optimizing  a 
constrast  function  which  is  not  squaring  the  samples 
fourth-order  autocumulants,  have  been  presented  recently  in 
[8]  still  for  statistically  independent  stationary  sources. 

However,  in  many  situations  of  practical  interest,  the 
received  sources  are  (quasi)-cyclostationary  (digital 
radiocommunications)  and  are  not  always  statistically 
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independent  but  may  be  correlated  to  each  other,  which 
occurs  in  particular  for  HF  links  or  in  mobile 
radiocommunications  contexts  where  propagation 
multipaths  are  omnipresent.  In  such  situations,  the 
behaviour  of  the  previous  HO  blind  source  separation 
methods  is  not  known,  which  may  be  a  limitation  to  the 
use  of  these  methods  in  operational  contexts. 

The  purpose  of  this  paper  is  precisely  to  fill  the  gap 
previously  mentionned  by  analysing  the  behaviour,  in 
radiocommunications  contexts,  of  the  three  HO  blind 
source  separation  methods  introduced  in  [2],  [3]  and  [5] 
respectively,  when  several  potentially  correlated  paths  of 
each  source,  assumed  (quasi)-cyclostationary,  are  received 
by  the  array. 

2.  HYPOTHESIS  AND  PROBLEM 
FORMULATION 

Consider  an  array  of  N  Narrow-Band  (NB)  sensors  and 
let  us  call  x(t)  the  vector  of  the  complex  amplitudes  of  the 
signals  at  the  output  of  these  sensors.  Each  sensor  is 
assumed  to  receive  a  noisy  mixture  of  P  statistically 
independent  NB  (quasi)-cyclostationary  sources,  with  their 
associated  propagation  multipaths.  Under  these 
assumptions,  the  observation  vector  x(t)  can  be  written  as 
follows 

P  Mi 

x(t)  =  X  2 

/=!  k=\ 

P 

=  +6(0  4  Am(t)  +6(0  (2.1) 

1=1 

where  6(0  is  the  noise  vector,  assumed  zero-mean, 
stationary,  spatially  white  and  Gaussian,  coq  is  the  carrier 
pulsation,  M/  is  the  number  of  paths  associated  to  the 
source  /,  w/(0  is  the  complex  envelope  of  the  source  i, 
^ih  and  are  the  complex  attenuation,  the  delay  and 
the  steering  vector  of  the  path  k  of  the  source  /,  /w/(0  is 
the  (M/X  1)  vector  which  components  are  the  complex 
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signals  mikii)  =  m,-(f  -  'lik)  ®  (1  <  A:  <  Mi),  Ai 

is  the  (N  X  Mi)  matrix  of  the  sources  steering  vectors  Uik 
{\  <k<  Mi),  m{t)  is  the  (M  X  1)  vector  obtained  by 
concatenation  of  the  vectors  mi(t)  and  A  is  the  {N  x  M) 
matrix  of  all  the  vectors  Oik,  where  M  is  the  sum  of  the 
Mi,{l<i<P). 

In  these  conditions,  the  correlation  matrix  ot  me 
observation  vector,  R^^t)  =  E[x(f)x(f)  A,  can  be  written  as 

R^(t)  =  A  Rmit)  +  ^2  I  (2.2) 

where  t  means  transposition  and  conjugation,  t\2  is  the 
mean  power  of  the  noise  per  sensor,  I  is  the  Identity  matrix 
and  Rm(t)  =  E[m(t)  m(,t)']  is  the  correlation  matrix  of  the 
vector  m(t). 

In  a  same  way,  the  quadricovariance  Qxit)  of  the 
observation  vector  jc(0,  which  components,  defined  by 
Qxii,j,  k,  l)(t)  =  Cum(x:,(0,  Xj{t)* ,  xki^)* ,  xi(t)),  are  the 
fourth  order  cumulants  of  j:(0.  can  be  written  as 

Qxit)  =  (A  ®A*)!2m(0(A  ®A*)^  (2-3) 

where  QmiO  is  the  quadricovariance  of  the  vector  mit),  * 
means  complex  conjugation  and  ®  corresponds  to  the 
Kronecker  product. 

In  fact,  the  expression  (2.1)  describes  N  particular 
convolutive  mixtures  of  P  statistically  independent  sources 
at  the  output  of  the  sensors.  These  mixtures  could  be 
processed  by  every  blind  separators  of  convolutive  mixtures 
developed  these  last  years  [9].  However,  in  practical 
situations,  for  some  reasons  such  as,  for  example,  that  of 
the  numerical  complexity,  it  may  be  chosen  to  process  the 
vectorial  mixture  (2.1)  as  an  instantaneous  one,  considering 
a  propagation  path  as  a  particular  source.  This  is  the 
philosophy  we  adopt  in  this  paper.  In  these  conditions, 
although  in  (quasi)-cyclostationary  contexts  it  may  be 
advantageous  to  implement  a  Poly-Periodic  (PP)  and 
Widely  Linear  structure  of  array  filtering  [10],  the  problem 
of  sources  separation  we  address  in  this  paper  is  to  find  the 
Linear  and  Time  Invariant  (TI)  (NxM)m  separator  W, 
outputing  the  vector  y(f)  =  W  ^xit)  and  giving,  to  within 
a  diagonal  and  a  permutation  matrix,  the  best  estimate  of 
the  vector  mit).  In  the  following  sections,  we  study  the 
behaviour  of  the  tree  HO  blind  source  separators  W 
introduced  in  [2],  [3]  and  [5]  respectively,  for  different 
scenari  of  sources  and  paths,  for  several  digital  modulations 
and  relative  time  delays  between  the  paths. 

3.  HO  BLIND  SOURCE  SEPARATION  OF 

(QUASI)-CYCLOSTATIONARY  SOURCES 

3.1  Possible  HO  blind  source  separators 

In  (quasi)-cyclostationary  contexts,  the  matrices  (2.2) 
and  (2.3)  become  Time-Dependent  and  more  precisely  PP. 
As  a  consequence,  the  matrices  Rxi^)  Qxit)  have 


Fourier  serial  expansions  which  show  off  in  particular  the 
cyclic  frequencies  of  the  observations.  It  may  be  very 
useful  to  exploit  the  information  contained  in  all  the  cyclic 
frequencies  of  the  observations  to  improve  the  performance 
of  the  HO  blind  source  separators,  as  it  has  been  shown 
recently  in  [1 1].  However,  for  particular  reasons  such  as  the 
numerical  complexity,  we  may  prefer  to  still  use,  in 
(quasi)-cyclostationary  contexts,  the  classical  methods  of 
HO  blind  source  separation  introduced  in  [2],  [3]  or  [5], 
which,  in  this  case,  exploit  only  the  information  contained 
in  the  cyclic  frequ^cy  zero  of  Rxi^)  Qxit)<  i  e-  in  the 
temporal  mean  Rx  =  nnd  Qx  =  of  Rxit) 

and  Q/,t)  respectively,  which  is  the  choice  we  adopt  in  this 

paper. 

Note  that  for  stationary  sources,  the  temporal  mean  Rx 
and  Qx  of  the  2nd  and  4th  order  statistics  correspond  to  the 
statistics  themselves,  which  is  not  the  case  for  (quasi)- 
cyclostationary  sources.  In  this  latter  case,  Rx  and  Qx  can 
still  be  written  as  (2.2)  and  (2.3)  but  where  Rmit)  and 
Qffiit)  are  replaced  by  their  temporal  mean  noted  R^  and 
Qjfi  respectively.  Thus,  the  temporal  mean  operation 
obviously  preserves  the  algebraic  structure  of  Rxif)  and 
Qxit)  and  also  the  potential  second  and  fourth  order 
statistical  independence  of  the  paths  iRm  i®  still  diagonal 
and  the  non  zero  elements  of  Gm  are  still  the  4-th  order 
autocumulants  of  the  paths  when  the  latter  are 
independent). 

3.2  Statistics  estimation 

It  is  well  known  that  for  zero-mean,  stationary  and 
ergodic  sources,  the  classical  estimators  of  the  2nd  and  4th 
order  cumulants  provide  asymptotically  unbiaised  estimates 
of  the  2nd  and  4th  order  cumulant  of  the  data,  which 
variance  tends  to  zero  when  the  number  of  independent 
samples  increases.  However,  in  the  presence  of  (quasi)- 
cyclostationary  and  cyclo-ergodic  sources,  we  must  wonder 
whether  these  classical  estimators  still  generate 
asymptotically  unbiaised  estimates  of  the  data  statistics 
temporal  mean.  The  answer  to  this  question  has  been  given 
in  [12]  and  is  negative  in  the  general  case  for  the  4th  order 
cumulant.  More  precisely,  noting  /??(»,  j).  Cxii^j) 
M\ii,  j,  k,  1)  the  coefficients  associated  to  the  cyclic 
frequencies  a,  P  and  y  in  the  Fourier  serial  expansion  of 
Rxii,Dit)  =  E[xA0J(/(0*].  Cxii,j)it)  =P‘[xiit)xjit)]  and 
MxiiJ,  k,  0(0  =  E[xiit)xjit)*xkit)  xiit)]  respectively,  it 
has  been  shown  in  [12]  that 

QxiUj,  k,  1)  =  M%,j,  k,l)-Y,  PxHJ)  P~xiU  k) 

a 

- y ^?(i. k) R-?ih j)-Y. 0 ci(/-, k)  (3.1) 
a  P 

while  the  classical  4th  order  cumulant  estimator  generates 
asymptotically  an  apparent  4th  order  cumulant  given  by 
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Qxaih  h  1)  ^  M%,  j,  k,  1)  -  R%  j)  R%  k) 

-  R%,  k)  R^^il,  j)  -  cP^a,  1)  C«0'.  k)  (3.2) 

Comparing  (3.1)  and  (3.2),  we  deduce  that  for  (quasi)- 
cyclostationary  sources,  the  classical  estimators  of  the  4th- 
order  cumulant  do  not  generate,  in  the  general  case,  the  true 
value  of  the  latter,  which  must  be  taken  into  account  in  the 
behaviour  analysis  of  the  classical  HO  blind  separators  in 
(quasi)-cyclostationary  contexts  and  which  may  even 
prevent  (in  very  particular  situations)  the  separation  of 
statistically  independent  non  Gaussian  sources  [13]. 

3.3  Classical  HO  blind  separators  description 

The  indirect  HO  methods  presented  in  [2],  [3]  and  [5] 
aim  at  blindly  identifying  the  sources  steering  vectors 
before  the  effective  sources  separation,  the  latter  being  done 
by  implementing  a  spatial  filtering  operation  from  the 
steering  vectors  estimates  [6].  The  blind  identification  of 
the  latter  requires  a  prewhitening  of  the  data,  by  the  pseudo¬ 
inverse,  Rf-,  of  a  square  root  of  =  A  which 

aims  at  orthonormalizing  (for  statistically  independent 
paths)  these  steering  vectors  so  as  to  search  for  the  latter 
through  a  unitary  matrix  simpler  to  handle.  For  each  of  the 
methods  presented  in  [2],  [3]  and  [5],  this  unitary  matrix 
must  maximize  a  particular  blind  criterion  which  is 
theoretically  a  function  of  the  elements,  where  is  the 
temporal  mean  of  the  quadricovariance  of  the  whitened  data 
z(t)  =Rs^^^x(t).  However  practically,  the  blind  criterion 
optimized  by  the  classical  HO  blind  separators  is  a  function 
of  the  gza  elements,  where  Q^a  is  the  apparent 
quadricovariance  of  the  whitened  data. 

Using  (2.1),  the  vector  z(0  can  be  written  as 
P  Mi 

z(t)  =  S  S  a'ik  +  m  =  A  +  b\f)  (3.3) 
i=l  k=l 

where  m/^(0  is  the  normalized  complex  envelope  of  the 
path  ik,  A  '  is  the  (M  X  M)  matrix  of  the  whitened  paths 
steering  vectors  and  ^’(0  is  the  whitened  noise  vector. 
Consequently,  the  2^  and  matrices  can  be  written  as 

ez(a)  =  (A  '<e>A  '*)Qm\a)(A  ’®A  ’*)^  (3.4) 

where  and  true  and  the  apparent  temporal 

mean  of  the  quadricovariance  of  mXt)  respectively. 

For  M  stationary  and  statistically  independent  total 
paths,  the  HO  blind  separators  introduced  in  [2],  [3]  and  [5] 
have  high  performance  [6-8],  which  is  directly  related  to  the 
fact  that  the  M  vectors  are  orthonormal 

eigenvectors  of  associated  to  the  non  zero 

eigenvalues.  We  must  then  wonder  whether  this  result  still 
holds  for  (quasi)-cyclostationary  and  potentially  correlated 
paths  and  if  not,  what  is  the  behaviour  of  these  separators 
in  such  situations,  which  is  the  purpose  of  the  following. 


4.  FOURTH  ORDER  CORRELATION 
PROPERTIES  OF  DIGITAL  MODULATIONS 

The  analysis  of  the  eigenstructure  of  and  Q^a  m  the 
presence  of  several  potentially  correlated  (quasi)- 
cyclostationary  paths  requires  the  analysis  of  Qm'  and  Q^^’a 
in  the  same  context  and  in  particular  the  analysis  of  the 
4th-order  correlation  properties  of  digital  modulations.  For 
this  purpose,  we  consider  in  this  section  only  one  source 
(P  =  1)  with  two  paths  {M\  =  2),  we  assume  that  Tn  =0, 
^^12  =  ^ote  m\{t)  simply  m{t)  and  we  analyse  the 
evolution  of  the  16  and  2/n’a  elements  as  a  function  of 
T.  Note  that  due  to  the  particular  symetries  of  these 
matrices,  the  16  elements  of  each  matrix  can  be  deduced 
from  only  5  elements  corresponding  to  the  element  (1,1, 
1,  1)  (temporal  mean  of  the  4th  order  true  or  apparent 
autocumulant)  and  to  the  4  elements  (1,  1,  1,  2),  (1,  1,  2, 
2),  (1,  2,  2,  1)  and  (1,  2,  2,  2)  (temporal  mean  of  the  4th 
order  true  or  apparent  crosscumulants). 

Recalling  that  the  2nd  order  correlation  coefficient 
between  the  two  considered  paths  is  defined  by  - 
<E[m'(0  m\t  -  T)*]>ej^^\  we  define,  for  each  of  the  two 
matrices  2^'  and  four  4th-order  correlation 

coefficients  (associated  to  the  indices  ijkl  =  1112,  1122, 
1221  and  1222)  defined  by 

P4(a)[ijkl](z)  =  Qm\a)Wkl]  /  Qm\a)[^'^  1 1]  (4.1) 

The  four  coefficients  p^[i]kl]{x)  and  the  four  others 
P4a[//A:/](T)  characterize  the  true  and  the  apparent  4th-order 
coirelation  of  all  the  modulations  respectively.  From  these 
coefficients,  it  is  also  possible  to  define,  for  each  matrix 
2m'  and  2m’a»  an  average  4th  order  correlation  coefficient 
which  modulus  can  be  defined  by  the  following  expression 

Ip4(a).av(^)l  =  (1/I4)[4lp4(a)[l  1  12](t)I  +  4lp4(a)[l  122](t)I 
+  2lp4(a)[1221](x)l  +  4lp4(a)[1222](T)l]  (4.2) 

In  order  to  quantify  the  4th-order  correlation  of  some 
modulations,  let  us  consider  the  linear  modulations, 
characterized  by  a  complex  envelope  m{t)  given  by 

m{i)  =  ^an  v(t  -  nT)  (4.3) 

n 

where  the  complex  symbols  %  are  i.i.d.  random  variables, 
T  is  the  symbol  duration  and  v{t)  is  a  real-valued  pulse 
function.  Under  these  assumptions,  it  is  possible  to  show 
that : 

-  lp2('C)l  and  the  modulus  of  the  four  true  4th-order 
correlation  coefficient  only  depend  on  x  and  v{t)  but  do  not 
depend  on  the  symbol  statistics. 

-  the  modulus  of  the  four  apparent  4th-order  correlation 
coefficient  depend  on  x,  v(0  and  also  on  the  2th  and  4th- 
order  symbol  statistics,  which  confirms  the  fact  that  the 
classical  4th-order  cumulant  estimator  changes  the  4th-order 
correlation  of  the  linear  modulations. 
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For  example,  if  we  choose  the  square  pulse  such  that 
v(0  =  1  if  0  <  t  <  r  and  v(r)  =  0  elsewhere,  we  find  that 
p2(x)  and  the  four  true  4th-order  correlation  coefficients 
have  the  same  modulus  equal  to  1  -  Ixl/T.  This  implicates 
that  Ip4,av('t)l  =  IpiCx)!  and  shows  that  generally  a  2nd  order 
correlation  between  paths  generates  also  a  4th  order 
correlation. 

The  previous  results  are  illustrated  at  figure  1  which 
shows  the  variations  of  Ip4>av('^)l  IP4a>av('^)l  ^ 
function  of  Ip2(x)l  for  a  BPSK  and  a  QPSK  modulation  and 
for  two  pulse  functions  corresponding  to  the  square  and  the 
half-Nyquist  function  with  a  roll-off  of  0.25. 


Fig.  1  -  \p4,av('^)\and\p4a,av('^)\  as  a  function  of  \p2(‘^)\ 
—  ■  \P4J  .  ••••  :  W 


5.  EIGENSTRUCTURE  OF  Qz  AND  Qza 

We  analyse,  in  this  section,  the  eigenstructure  of  the 
matrices  Qz  and  (2za  in  the  presence  of  correlated  paths  of 
(quasi)-cyclostationary  independent  sources. 

5.1  Prewhitening  of  the  data 

-1/2 

The  prewhitening  stage  of  the  data  by  the  matrix  Rs' 
transforms  (2.1)  into  (3.3).  It  is  then  possible  to  show  tiiat 
in  the  presence  of  correlated  paths,  the  whitened  steering 
vectors  a'jjt  are  no  longer  orthonormalized.  More  precisely, 
it  can  be  shown  that : 

-  a'ijc  is  orthogonal  to  aji  if  and  only  if  and 

mji(t)  are  not  correlated 

-  a'l}^  is  normalized  if  and  only  if  An/^(0  is  uncorrelated 
with  all  the  m)/(0  (Jl  ^  ik) 

For  example,  in  the  case  where  P  =  1  and  M  i  =  2, 
noting  P2  the  2nd  order  correlation  coefficient  between 
m  'liit)  and  ni\2(t),  we  find  that 

a\\^a\\  =  a\2^a\2  =  1  /  (1  —  Ip2l^) 

a\\^a\2-  -  P2^(l”lp2^^) 


5.2  Eigenstructure  oi  Qz 

The  statistical  independence  of  the  P  considered  sources 
implicates  that  the  Gz  matrix,  defined  by  (3.4),  can  be 
written  as 

P  ^ 

(2z  =X  (A'imT)Qmi  (A/®a;*)^  4  (5.3) 

1=1 

where  A)  is  the  (M  X  M,)  matrix  of  the  sources  steering 
vectors  dik  il<lc<  Mi)  and  Qmi  is  the  temporal  mean  of 
the  quadricovariance  of  the  vector  tn^i)  which  components 
are  the  m'ikit)  (1  <  *  ^  Mi).  The  orthogonality  of  the 
vectors  dik  and  dji  for  i  (section  5.1)  implicates  that  for 
I  <i<P,  the  eigenvalues  and  eigenvectors  of  Qzi  are  also 
eigenvalues  and  eigenvectors  of  Qz-  Consequently,  the 
eigenvalues  and  eigenvectors  of  Qz  correspond  to  the 
reunion  of  the  eigenvalues  and  eigenvectors  of  the  matrices 
Qzi-  In  other  words,  statistical  independent  sources 
contribute  to  the  eigenstructure  of  Qz  without  any 
interaction  between  themselves.  The  rank  r  of  Qz  is  then 
equal  to  the  sum  of  the  rank,  r,',  of  the  matrices  Qzi- 

On  the  other  hand,  the  rank  of  Qzi>  fh 
between  M,  (independent  paths  of  the  source  i)  and  Mi  (all 
the  paths  of  the  source  i  are  correlated  to  each  other). 
However,  it  can  be  shown  that  for  linear  modulations,  even 
when  all  the  paths  of  the  source  i  are  correlated  to  each 
other,  ri  <  Mf.  Besides,  whatever  the  kind  of  modulation, 
it  can  be  shown  that  the  eigenvalues  of  Qzi  and  thus  those 
of  Qz  do  not  depend  on  the  mixture  matrix  A. 

5.3  Eigenstructure  of  Qza 

The  modification  of  the  4th-order  correlation  of  the 
sources  by  the  classical  4th-order  cumulant  estimators 
implicates  that  it  may  exists  situations  for  which  the 
apparent  4th-order  cross-cumulant  temporal  mean  of  two 
statistically  independent  sources  is  not  zero  [13]. 
Consequently,  Qza  may  have  a  structure  not  similar  to  that 
described  by  (5.3)  and  the  results  of  section  5.2  may  not  be 
applied  for  Qza-  However,  in  most  practical  cases  the 
structure  (5.3)  still  holds  exactly  or  approximately  for  Qza, 
with  Qmi  and  Qzi  replaced  by  Qmia  and  Qzia,  and  the 
results  of  section  5.2  can  still  be  applied,  despite  of  the  fact 
that  the  apparent  4th-order  autocumulants  are  no  longer 
equal  to  the  true  ones.  However  note  that  for  linear 
modulations,  the  rank  of  Qzia  is  often  equal  to  Mj  when 
all  the  paths  of  the  source  i  are  correlated  to  each  other. 

5.4  Illustrations 

The  figure  2  illustrates  the  previous  results  by  showing 
the  values  of  the  non  zero  eigenvalues  modulus  of  Qz  and 
Qzja  in  the  presence  of  P  =  3  independent  sources  (QPSK- 
Nyquist,  QPSK-Square,  BPSK-Square)  with  Mi  =  2,  M2  = 
1  and  M3  =  1,  for  several  values  of  x/T,  where  x  is  the 
relative  time  delay  between  the  two  paths  of  the  source  1. 
The  varations  of  X  does  not  modify  the  2  highest  values. 
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Fig.  2  -  Eigenvalues  modulus  of  and  Q^a,  P  -  3 
with  M\  =2y  M2  =  1  and  M3  =  1,  as  a  function  of  xlT 


8.  CONCLUSION 

The  behaviour  of  the  classical  indirect  HO  blind  source 
separation  methods  has  been  analysed  in  the  presence  of 
correlated  paths  of  several  (quasi)-cyclostationary  sources, 
through  the  analyses  of  the  4th-order  correlation  of  digital 
modulation  and  the  eigenstructure  of  the  whitened 
quadricovariance.  The  choice  of  the  4th-order  cumulant 
temporal  mean  estimator  has  been  shown  to  be  crutial  in 
some  cases.  In  most  practical  situations,  the  classical 
methods  do  not  mix  paths  of  independent  sources  and 
separate  correlated  paths  up  to  a  high  2nd  order  correlation. 


6.  BLIND  IDENTIFICATION  AND  SOURCE 
SEPARATION 

In  the  presence  of  one  source  with  several  correlated 
paths,  the  blind  identification  of  the  paths  steering  vectors 
cannot  be  done  exactly  since  the  whitened  steering  vectors 
are  not  orthogonal  to  each  other  (section  5.1).  In  this  case, 
the  blind  estimates  of  the  paths  steering  vectors  are  linear 
combination  of  the  true  ones  with  coefficients  directly 
related  to  the  4th- order  correlation  between  the  paths. 
Consequently,  the  separation  of  correlated  paths  cannot  be 
optimal  but  still  occurs  up  to  a  relatively  high  level  of 
2nd“Order  correlation,  depending  on  the  matrix  which  is 
exploited  (true  or  apparent),  the  modulation  and  the  pulse 
function  for  linear  modulations. 

In  the  presence  of  several  independent  sources  with  their 
own  paths,  although  it  exists  situations  for  which  the 
separation  of  the  different  sources  (not  paths)  fails,  even 
from  the  exploitation  of  the  true  in  most  practical 
cases,  this  separation  occurs  even  from  the  use  of 
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Fig.  3  -  Spatial  correlation  coefficient  as  a  function  of  6 

The  Figure  3  illustrates  the  previous  results  by 
showing  the  spatial  correlation  coefficient  between  the  4 
blind  steering  vector  estimates  and  the  array  manifold  of  a 
ULA  of  10  sensors  as  a  function  of  0  for  one  QPSK- 
Nyquist  source  (ro  0.25)  with  two  paths  which  DOA  are 
-70°  and  -30°  and  such  that  xlT  =  0.4,  a  QPSK-Square 
(30°)  and  a  BPSK-Square  (50°)  with  one  path  each.  Note 
that  Qz  and  Q^a  give,  in  that  case,  the  same  good 
estimation  of  the  paths  DOA  by  this  DF  method  called 
Blind-Maxcor  [14]. 
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Abstract 

W  consider  the  problem  of  sources  separation  and  par¬ 
ticularly  criteria  based  approaches.  A  generalized  defini¬ 
tion  of  contrast  function  is  given  in  order  to  consider  non 
symmetrical  and/or  non  scale  invariant  functions.  Two  gen¬ 
eralized  contrasts  involving  high-order  cumulants  are  pro¬ 
posed.  In  the  case  of  two  sources,  we  derive  the  optimal  non 
symmetrical  coefficient  by  minimizing  a  performance  index. 
Finally  computer  simulations  are  presented  in  order  to  illus¬ 
trate  the  results  and  to  show  the  interest  in  considering  a  non 
symmetrical  contrast. 


1.  Introduction 

The  source  separation  problem  can  be  simply  formulated 
as  follows;  several  unknown  linear  mixtures  of  certain  in¬ 
dependent  signals  called  sources  are  observed.  The  goal  is 
to  recover  the  original  sources  without  knowing  the  mixing 
system.  Hence  this  must  be  realized  from  the  only  observa¬ 
tions  and  this  is  the  reason  why  this  problem  is  often  quali¬ 
fied  as  “blind”. 

Among  the  great  number  of  approaches  that  have  been 
proposed  in  the  recent  literature,  we  are  primary  concerned 
with  high-order  statistics  criteria  based  approaches,  e.g.  [1]- 
[9].  In  this  field,  contrast  functions  constitute  separation  cri¬ 
teria  in  the  sense  that  their  maximization  solve  the  source 
separation  problem.  As  defined  in  [2],  contrasts  are  imposed 
to  be  symmetrical  and  scale  invariant  functions.  The  global 
maximization  of  such  a  contrast  is  a  necessary  and  sufficient 
condition  for  source  separation.  Even  if  this  is  a  good  thing, 
one  can  be  interested  in  finding  “only”  sufficient  conditions. 
This  is  the  main  purpose  of  this  communication  to  show  that 
we  can  take  advantage  of  considering  non  symmetrical  con¬ 
trast  according  to  a  generalized  definition.  Notice  that,  even 
if  it  is  not  clearly  said,  non  symmetrical  criteria  have  been 


derived  in  [1]  but  with  another  approach  and  non  scale  in¬ 
variant  functions  have  been  given  in  [7]. 

2.  Notations  and  assumptions 

The  classical  linear  memory  less  mixture  model  is  consid¬ 
ered.  It  reads 

x{k)  =  Ga{k)  (1) 

where  x{k)  is  the  {N,  1)  vector  of  observations,  a{t)  the 
{N,l)  vector  of  statistically  independent  sources,  k  the  dis¬ 
crete  time  and  G  the  (N,  N)  invertible  mixture  matrix.  The 
source  separation  problem  consists  in  estimating  a  matrix  H 
such  that  the  (AT,  1)  vector 

y{k)  =  Hx{k)  (2) 

restores  the  N  input  sources  ai(fc),  i  €  {1, . . . ,  Y}.  We  de¬ 
fine  the  matrix  S  of  the  global  system  as 

S  =  HG,  (3) 

hence  according  to  (1)  and  (2) 

y{k)  =  Sa{k) .  (4) 

Because  sources  are  assumed  inobservable,  there  are  some 
inherent  indeterminations  in  their  restitution.  That  is,  in  gen¬ 
eral,  we  cannot  identify  the  power  and  the  order  of  each 
sources.  Hence  they  are  said  separated  if  and  only  if  the 
global  matrix  reads 

S  =  DP  (5) 

where  D  is  an  invertible  diagonal  matrix  and  P  a  permuta¬ 
tion  matrix. 

The  following  assumptions  are  made 

Ala .  The  sources  ai(fc),  i  €  {1,  •  •  • ,  AT},  are  zero-mean, 
unit  power  and  statistically  mutually  independent; 
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Alb.  a{k)  is  a  random  vector  stationary  up  to  order 
under  consideration,  i.e.  Vi  G  {1, . . .  ,iV},  the  cumu- 
lant  Cum  {ai{k), . . . ,  ai{k)^  a* (fe), . . . ,  (k))  is  an  inde- 

V - ,, - '  V - ^ 

px  qx 

pendent  function  of  fc,  denoted  by  Cp^gaf, 

Ale .  Given  p  and  q,  the  cumulants  are  assumed  to  satisfy 
one  of  the  two  following  conditions: 
cl  .  [Cp^gOll  ^  ^  ^ 

c2  .  ICp^gUll  ^  *  *  *  ^  ICp^qU^V— 1|  ^  ICp^gd^vj  —  0. 

In  particular  this  means  that  at  most  one  of  the  cumulants 
Cp^^ai,  i  G  {1, . . . ,  A^}  is  zero. 

A2  .  Global  matrix  S  is  such  that  SS^  =  S^S  =  J. 

The  set  of  random  vectors  satisfying  assumptions  Ala  to 
Ale  is  denoted  by  A.  The  set  of  matrix  satisfying  assump¬ 
tion  A2  is  denoted  by  U.  The  subset  of  U  of  matrices  satis¬ 
fying  (5)  is  denoted  by  V.  Finally  the  set  of  random  vectors 
y(k)  satisfying  (4)  where  a{k)  G  ^  and  5  G  W  is  denoted 
by  y, 

3.  Source  separation  criteria 

We  first  recall  the  initial  definition  of  a  contrast  [2]: 

Definition  1  A  contrast  on  is  a  multivariate  mapping  X 
from  the  set  3^  to  JR  which  satisfies  the  following  three  re¬ 
quirements: 

*1 .  Vi/  G  3^,  VS  G  X{Sy)  =  I{y); 

*2  .  Va  G  A,  VS  G  U,  I{Sa)  <  J(a); 

*3.  VaG  AVSGZY,X(Sa)  =  J(a)^SGP. 

Such  contrasts  are  symmetrical  and  scale  invariant  functions 
(*1)  which  have  to  be  maximized  (*2)  to  get  separation 
(*3).  According  to  this  definition,  numerous  contrasts  have 
been  proposed,  see  e.g.  [6].  Now,  in  order  to  consider  non 
symmetrical  (and  also  non  scale  invariant)  functions,  we 
propose  the  following  “generalized”  definition  of  a  contrast: 

Definition  2  A  contrast  on  (y^Vd)  is  a  multivariate  mapping 
X  from  the  set  y  to  JR  which  satisfies,  Va  G  ^  and  VS  G  S, 
the  following  two  requirements: 

R1 .  X{Sa)  <  J(a); 

R2  .  3Vd  CV.Vd^  0  /  J(Sa)  =  J(a)  <^SeVd^ 

Such  contrasts  are  not  imposed  to  be,  a  priori,  symmetrical 
or  scale  invariant.  Moreover  all  the  global  maxima  consti¬ 
tute  a  non  empty  subset  of  the  set  of  separating  matrices. 
Hence  the  maximization  of  the  contrast  as  defined  in  defi¬ 
nition  2  is  no  longer  a  necessary  and  sufficient  condition  for 
source  separation  but  a  sufficient  one.  It  can  also  be  noticed 
that  contrasts  in  the  sense  of  definition  1  are  (fortunately) 
contrasts  in  the  sense  of  definition  2.  The  converse  is  not, 
in  general,  true  as  exemplified  in  the  following. 


Define 

N 

V,<i(y)  =  '^^iKp,Qyi\  (6) 

i-1 

where  the  real  numbers  7^,  z  G  {1, . . . ,  JV},  are  such  that 

7i  >  *  •  •  >  7iv  >  0  . 

We  can  now  propose  the  following  result 

Proposition  1  If  p-fg  >  3,  the  function  X^^^{y)  is  a  contrast 
in  the  sense  of  definition  2. 

Because  of  lack  of  place,  the  proof  is  reported  in  a  full-length 
paper  [8].  Now  let  us  consider  the  specific  case  of  sources 
with  identical  sign  Sp  of  their  (p,  p)  order  cumulants,  i.e.  V^, 

s§n(Cp^pCi-j)  =  Sp  , 

then  we  have  the  following  result: 

Proposition  2  If  p  >  2,  the  function 

N 

^p,piy)  ^p  ^^^'yi^PipVi 
is  a  contrast  in  the  sense  of  definition  2. 

4.  Gradient  based  algorithm 

4.1.  Algorithm 

In  order  to  have  A2 ,  we  consider  that  a  first  stage  realizes 
a  whitening  of  the  observations.  This  “classical”  stage  will 
not  be  discussed  here.  The  whiteness  of  y  is  then  ensured 
if  H  is  orthonormal.  In  order  to  find  such  an  orthonormal 
matrix  H  which  separates  the  sources,  a  gradient-based  al¬ 
gorithm  is  proposed  in  order  to  maximize  one  of  the  con¬ 
trasts.  For  this  task  we  use  a  parametrization  of  H  thanks 
to  planar  (Givens)  rotations.  For  the  sake  of  simplicity,  we 
only  consider  the  case  of  JV  =  2  real  sources  and  contrast 
which  will  be  denoted  J'  in  the  following.  Hence  H  is 
parametrized  thanks  to  one  angle: 

rr  _  f  cos  d  sin  0  \ 

~  —  sin0  cosO  J 

and  the  gradient  based  updating  rule  reads 

e{n)  =  e(n  -  1)  +  (7) 

where  we  have,  after  some  algebra, 
dj 

—  =  4£2(7iE[j/?y2]  -  72E[yiy|])  .  (8) 
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In  practice  we  use  a  stochastic  version  of  (7)  by  dropping  the 
expectation  operator  in  (8),  hence 

e{n)  =  9{n  -  1)  +  iJ.£2yi{n)y2{n){yl{n)  -  Sylin))  (9) 

where 

fi  =  A'yiii'  and  ^  =  72/71  • 

4.2.  Convergence  analysis 

Let  us  consider  the  deviation  v  =  6-0  where  0  denotes 
the  true  value  of  parameter  in  order  to  have  yi  =  ai  and 
y2  =  a2.  Using  9{n)  =  v{n)  +  9,  adaptation  (9),  writen 
in  terms  of  deviation,  can  be  approximated  (at  order  one)  by 
the  following  linear  recursive  equation 

v{n)  =  (1  -  iiA{n))v{n  -  1)  +  liz(n)  (10) 

where 

z[n)  =  £2(c^i(^)^2(^^)  “  Sai{n)a2{n))  ;  (11) 

A{n)  —  e2{cii{n)  +  Sa^in)  —  3{1  +  S)ai{n)a2{n)) .  (12) 

Assuming  that  each  sources  are  independent  and  identically 
distributed  random  sequences,  then  it  is  easily  seen  that  the 
mean  of  u(n)  converge  to  zero  if  0  <  /z  <  2/E  A  where 

EA  =  |C2,2<^i|  +^1^2, 2^2!  •  (^^) 

Now  in  the  mean  square,  denoting  y(n)  =  Ei;^(n),  wehave 

V{n)  =  (1  -  fiX)V{n  -  1)  +  (14) 

where 

A  =  2EA  -  /zEA^ 

and  ^  ^ 

Ez^  =  Eal  +  6^E4  -  2SEaiEat  (15) 

It  is  easily  found  that  V{n)  converge  to 

Fy2 

Ev^  =  (16) 

if  0  <  ^  <  2EA/EA?. 

4.3.  Performance  of  the  algorithm 


In  order  to  take  into  account  both  the  convergence  speed 
and  the  mean  square  error,  we  define  a  performance  index  Q 


as 


(17) 


where  fiX  charaterizes  the  dB  wins  per  iterations.  Now  for 
small  enough  /x,  we  have  A  «  2EA  and  thanks  to  (16) 


Ez2 

4(E^)2  ■ 


(18) 


This  index  depends  on  the  statistics  of  the  sources  and  on  the 
fi-ee  parameter  6.  One  can  ask  if  there  exists  a  minimum  for 
Q  w.r.t.  6.  One  can  easily  obtain  the  value  of  6  denoted  So 
such  that  dQ/dS  =  0: 

^  _  Eaf|C2,2Q2|  +  EojEa||C2,2Qi| 

°  Ea2|C2,2ai|  +  EaiEa2|C2,2a2| 

It  can  also  be  shown  that  the  second  derivative  of  Q  w.r.t.  6 
for  5  =  <5o  is  positive  and  thus  this  is  a  minimum  for  Q. 

Given  sources,  if  <  1  then  it  can  be  used  in  the  adap¬ 
tation  (9)  in  order  to  get  the  best  performances  in  the  sense 
that  Q  is  minimum. 

For  example,  if  the  two  sources  have  the  same  statistics 
then  =  1-  In  that  case,  there  is  no  interest  in  consider¬ 
ing  a  non  symmetrical  contrast.  Now  if  the  second  source 
(02)  is  Gaussian,  i.e.  Ea^  —  3  and  Ea®  =  15,  then  So  — 
Eai/5  and  if  the  first  source  ai  is  binary  then  Eai  =  1  and 
So  =  1/5  =  0.2.  It  can  be  noticed  that  there  exists  some 
cases  where  (5o  >  1  (when  Eo|  >  5  in  the  previous  case) 
and  thus,  sometimes,  it  occurs  that  no  optimal  contrast  can 
be  built  from  Q. 


5.  Computer  simulations 


In  order  to  illustrate  the  hereabove  results,  some  com¬ 
puter  simulations  are  presented  in  the  case  of  two  sources 
(N  =  2).  Convergence  of  the  algorithm  based  on  adaptation 
(9)  are  illustrated  thanks  to  an  index  defined  on  the  global 
matrix  S  according  to 


EE 


I 


max|sif|“ 


+ 


j  \  *  i  '  ■>' 


(20) 


where  a  >  1.  This  positive  index  is  indeed  zero  if  S  sat¬ 
isfies  (5)  and  a  small  value  indicates  the  proximity  to  the 
desired  solutions.  The  mixing  matrix  is  taken  orthonormal 
such  that  the  prewhitening  stage  is  not  necessary 

_  (  COS01  sin^i  A  o  _9n_!L  ton 

^=(-sin0i  cos^rj’  "  ^^80  ' 

In  all  the  simulations  the  index  ind2  is  considered. 

Three  cases  are  presented:  i)  one  binary  source  (i.e. 
source  taking  the  two  values  ±1  with  equal  probability)  and 
one  Gaussian  source;  ii)  one  4-PAM  source  (i.e.  source  tak¬ 
ing  the  four  values  ±3/\/5,  ±1/  >/5  with  equal  probability) 
and  one  Gaussian  source  and  finally  iii)  one  binary  source 
and  one  4-PAM  source.  We  plot  the  evolution  of  9{n)  and 
of  the  index  of  convergence  w.r.t.  iterations.  They  are  com¬ 
pared  to  those  of  the  classical  symmetrical  case  (i.e.  S  =  1). 
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In  all  cases,  Fig.  1  and  2  for  case  i),  Fig.3  and  4  for  case 

ii)  and  Fig.S  'and  6  for  case  iii),  one  can  see  an  advantage  in 

considering  a  non  symmetrical  contrast. 

Acknowledgment:  The  author  would  like  to  thank  Dr.  N. 

Thirion  for  helpful  comments  and  insights. 

References 

[1]  J.F.  Cardoso,  S.  Bose  and  B.  Friedlander,  “On  Optimal 
Source  Separation  Based  on  Second  and  Fourth  Or¬ 
der  Cumulants”,  in  Proc.  IEEE  SP  Workshop  on  SSAP, 
Corfu,  Greece,  pp  198-201,  June  1996. 

[2]  P.  Comon,  “Independent  Component  Analysis,  a  New 
Concept?”,  Signal  Processing,  Vol.  36,  pp  287-314, 
1994. 

[3]  N.  Delfosse  and  P.  Loubaton,  “Adaptive  Blind  Separa¬ 
tion  of  Independent  Sources:  a  Deflation  Approach”, 
Signal  Processing,  Vol.  45,  pp  59-83,  1995. 

[4]  Y.  Inouye  and  T.  Sato,  “Unconstrained  Optimization 
Criteria  for  Blind  Equalization  of  Multichannel  Linear 
Systems”,  in  Proc.  lEEESP  Workshop  on  SSAP,  Corfu, 
Greece,  pp  320-323,  June  1996. 

[5]  A.  Mansour  and  C.  Jutten,  “Fourth-Order  Criteria  for 
Blind  Source  Separation”,  IEEE  Trans,  on  Signal  Pro¬ 
cessing,  Vol.  43,  pp  2022-2025,  August  1995. 

[6]  E.  Moreau  and  J.-C.  Pesquet,  “Indepen¬ 
dence/Decorrelation  Measures  with  Applications 
to  Optimized  Orthonormal  Representations”,  to 
appear  in  Proc.  ICASSP’97,  Munich,  Germany,  April 
1997. 

[7]  E.  Moreau  and  N.  Thirion,  “Multichannel  Blind  Signal 
Deconvolution  Using  High  Order  Statistics”,  in  Proc. 
IEEE  SP  Workshop  on  SSAP,  Corfu,  Greece,  pp  336- 
339,  June  1996. 

[8]  E.  Moreau  and  N.  Thirion,  “Non  Synunetrical  Con¬ 
trasts  for  Sources  Separation”,  in  Preparation. 

[9]  J.K.  Tugnait,  “On  Blind  Separation  of  Convolutive 
Mixtures  of  Independent  Linear  Systems”,  in  Proc. 
IEEE  SP  Workshop  on  SSAP,  Corfu,  Greece,  pp  312- 
315,  June  1996. 


Non  optimal  contrast:  della  =  1 


Figure  1 .  Evolution  of  the  convergence  index 
with  the  classical  contrast  and  the  non  sym¬ 
metrical  optimal  one  in  case  i). 


Non  optima!  contrast:  delta  =  1 


Figure  2.  Evolution  of  the  angle  of  the  esti¬ 
mated  rotation  with  the  classical  contrast  and 
the  non  symmetrical  optimal  one  in  case  i). 
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Non  opfimal  contrast:  delta  =  1 


Figure  3.  Evolution  of  the  convergence  index 
with  the  classical  contrast  and  the  non  sym¬ 
metrical  optimal  one  in  case  ii). 


Figure  5.  Evolution  of  the  convergence  index 
with  the  classical  contrast  and  the  non  sym¬ 
metrical  optimal  one  in  case  iii). 


Non  optima!  contrast:  delta  =  1 


Iterations 

Optimal  contrast:  delta  =  0.328 


Iterafions 


Figure  4.  Evolution  of  the  angle  of  the  esti¬ 
mated  rotation  with  the  classical  contrast  and 
the  non  symmetrical  optimal  one  in  case  ii). 


Non  optimal  contrast:  delta  =  1 


Iterations 


Figure  6.  Evolution  of  the  angle  of  the  esti¬ 
mated  rotation  with  the  classical  contrast  and 
the  non  symmetrical  optimal  one  in  case  iii). 
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ABSTRACT 

Blind  source  separation  is  now  a  well  known  problem. 
Various  methods  have  been  proposed  for  instantaneous 
and  convolutive  mixtures  of  sources.  Conventional 
antenna  array  processing  techniques  are  based  on  the  use 
of  second  order  statistics  but  rest  on  restrictive 
assumptions.  Thus,  when  a  priori  informations  about  the 
propagation  or  the  geometry  of  the  array  are  not 
available,  the  model  can  be  generalized  to  a  blind 
sources  separation  model  It  supposes  the  statistical 
independence  of  the  sources  and  their  non-gaussianity. 

In  this  paper,  we  focus  on  the  narrow  band  source 
separation  problem  embedded  in  wide  band  jammers.  We 
show  that  JADE  algorithm  [1]  made  for  instantaneous 
mixture  is  still  valid  in  a  wide  band  context  where  only 
the  signals  of  interest  are  narrow-band.  We  also  prove 
that  a  wide  band  signal  tends  to  occupy  all  the  degrees  of 
freedom  of  the  covariance  matrix  and  modifies  the  signal 
subspace  dimension. 

1  PROBLEM  STATEMENT 

In  this  paper,  we  address  the  narrow  band  source 
separation  problem  embedded  in  wide  band  jammers. 
The  classical  approach  consists  in  employing  a  noise 
reduction  algorithm  based  on  second-order  statistics  of 
the  received  signals  and  using  a  priori  information  on 
signals,  wave  propagation  or  antenna. 

Nevertheless,  in  the  case  of  independent  signals  and 
jammers,  one  can  resort  to  a  more  original  approach 
which  relies  on  higher  order  statistics  to  achieve  a  ‘blind’ 
source  separation. 


This  study  is  supported  by  STSIE  (Service  Technique  des 
Systemes  dTnformation  et  de  I’Electronique). 


We  consider  an  array  of  m  sensors  and  n  sources 
(denoted  s(t)),  the  array  output  denoted  x(t)  is  corrupted 
by  independent  additive  gaussian  noise  (n(t)) : 

x(t)  =  As(t)  +  n(t).  (1) 

where  A  is  an  m  x  n  complex  matrix. 

Blind  source  separation  assumes  no  a  priori  information 
on  A  and  consists  in  finding  the  initial  signals  Si(t)  from 
the  observations  Xi(t)  of  the  mixture  (1). 

In  the  case  of  instantaneous  mixtures,  among  various 
blind  source  separation  methods,  available  in  literature 
[3]  [4]  [5],  Cardoso  and  Souloumiac  give  us  an  algorithm, 
called  JADE  [1],  asymptotically  optimal  [2] [6]  and  of 
less  implementation  complexity. 

This  technique  for  blind  source  separation  can  be 
developed  in  two  steps  :  normalization  by  whitening  the 
signal  part  (using  2-order  statistics)  and  joint 
diagonalization  of  a  set  of  eigenmatrices  (using  higher 
order  statistics). 

We  know  that  if  (AojSojHq)  is  a  solution  of  (1)  then 
(AoAP,P"A-'so,no) 

is  also  solution  of  (1)  where  A  is  a  diagonal  matrix  and 
P  a  permutation  matrix. 

This  source  separation  technique  has  been  used  for 
interference  cancellation  (synthesis  in  [7]). 

2  WIDE  BAND  CONTEXT 

We  first  deal  with  the  bandwidth  influence  on  covariance 
matrix  eigenvalues.  Thus,  the  realistic  simulation  of  the 
wave  propagation  through  an  array  of  sensors  allows  to 
take  into  account  the  wide  band  effects  on  the  spatial- 
temporal  structure  of  the  signals. 
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The  narrow  band  model  permits  to  liken  the  signals 
propagation  delays  between  sensors  to  simple  phase  shift 
independent  of  the  frequency. 

In  the  case  of  wide  band  signals,  we  must  take  into 
account  the  bandwidth  influence  on  the  signal  subspace 
dimension.  We  can  find  in  literature  [8], [9],  [10]  some 
algorithms  specific  to  wide  band  signals  or  convolutive 
mixtures. 

We  want  to  show  in  this  paper  the  JADE  algorithm 
behavior  in  a  wide  band  context. 

Many  simulations  have  been  proposed  using  different 
signals  (BPSK,  QPSK,  gaussian  jammers...)  with  variable 
frequencies  :  we  give  some  interesting  results. 

3  BANDWIDTH  INFLUENCE 


The  spatial  covariance  matrix  of  x(t)  can  be  denoted  : 

R.=R.+O^I 

where 


R  = 


Vo+B/2 

Jo(v)0“(v)dv 

Vo-B/2 


where  B  is  the  signal  bandwidth,  the  carrier 


frequency  and  <l>(v)  the  steering  vector  of  the  signal. 

We  can  express  <l>(v)  near  V,,  with  a  Taylor  series 
expansion  and  after,  a  simple  variable  change 
( V  =  V  —  Vq  )  we  have  ; 

.  ••  V  ^  ••• 

0(v  +  Vo)  =  0(Vo)  +  v4>(Vo)+-  ^(Vo)+^  ^(Vo) 


N~1 


n=0 


where  4>(v)  is  the  derivative  of  0(v)  with  respect  to 
V .  So  we  can  say  that : 

N-l 

^n+m  („)  („)H 

0(V  +  Vo)3)''(V  +  Vo)=  >  3>(Vo)4>  (Vo) 

n=0 

ni=0 


So :  R  = 


(n)  (in)H 
<I>(Vo)^  (Vo)dv 


-B/2 


n=0 

m=0 


B/2 

a>\vo)‘o  (Vo)Jv"^"’dv 

n=0  -B/2 

m=0 


We  can  see  that  only  the  odd  terms  (n+m+1)  are  non 
zero.  The  spatial  covariance  matrix  of  x(t)  can  be  written 
as : 

2B^  .  H 

R,=B<D(Vo)^”(Vo)  +— 4>(Vo)a>  (Vo) 

0(Vo)0  (Vo)  +  <I>(Vo)0”(Vo) 

if  we  stop  the  expansion  at  the  third  order. 

In  the  case  of  an  uniform  rectilinear  array  with 
equidistant  sensors,  we  can  give  the  expression  of  the 
steering  vectors : 

a)(v)=(i,<Dp02,...,a)N.,)^ 


j2re(k-lF2 — V 

with  =  e  '  ,  c  is  the  light  propagation 

velocity,  d  the  distance  between  two  adjacent  sensors  and 
6  the  wave-front  direction. 


In  this  context,  we  can  see  that : 

A  ,^dsin0_ 

Ok  =  j27i(k  - 1) - O,, 

c 

so : 

6  =  do 

where  D  is  a  diagonal  matrix.  We  can  also  write  : 

~2 

We  can  now  diagonalize  the  covariance  matrix  and  find 
the  eigenvalues 

'  36  N-l  I  C  j 

and 

,  1  „2N(N-hl)('jidsineY 

We  can  find  another  eigenvalue  but  if  we  stop  the 
computation  at  the  second  order,  two  eigenvalues  only 
appear. 
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If  we  consider  the  apparent  width  of  the  antenna 
considering  the  source  called  Da,  (D^  =(N  — l)dsin0 

for  an  uniform  rectilinear  array)  the  eigenvalues  become 

'  36  (N-1)^  \  c  J 

and 

X  -  ^  N(N  +  l)r7iBD,Y 
'  36  (N-1)^  I  c  J 

In  the  case  of  an  uniform  circular  array  with  equidistant 
sensors,  we  have  for  the  eigenvalues  of  the  covariance 
matrix : 

X,  =  N 


^3  = 


N 

320l  c  J 


To  illustrate  the  bandwidth  influence,  we  choose  an  array 
with  eight  sensors  and  we  are  interested  in  eigenvalues  of 
the  covariance  matrix. 

A  wide-band  signal  occupies  three  eigenvalues  in  the 
covariance  matrix  (figure  2). 


Eigenvalues 

If  we  increase  the  bandwidth  of  the  signal,  it  tends  to 
occupy  all  the  degree  of  freedom  of  Rs  (figure  3) 

In  this  context,  we  want  to  know  if  it’s  possible  to 
separate  a  narrow  band  signal  from  a  mixture  composed 
by  wide  band  jammers. 

4  JADE  IN  WIDE-BAND  CONTEXT 


If  we  consider  that  A  can  be  decomposed  in  : 

A=ui;''^n 

with  U  and  11  unitary  matrices. 

If  E[s(t)s(t)”]  =  I  then  the  covariance  matrix  of  the 
output  array  is  given  by  : 

R,  =  E[x(t)x(t)”  ]  =  UE,U  +  a^I 


whitening  :  we  can  decompose  Rx  in  eigen¬ 
vectors  and  eigenvalues : 


R.=[u 


n 


0 


We  can  see  that : 

z;'''u“x(t) = ns(t) + 5:;'"u”b(t) 

The  whitening  step  is  directly  liken  to  the  estimation  of 
the  signal  subspace  dimension. 

An  error  on  this  estimation  will  prevent  JADE  from 
separating  the  signals. 

Without  thermic  noise  ,  we  can  find  s(t)  =  ns(t) . 

Thus  we  can  determine  s(t)  up  to  unitary  matrix. 


estimation  of  s(t)  :  with  the  normalized 

problem,  the  identification  of  mixing  matrix  consists  only 
in  estimating  a  unitary  matrix  with  fourth  order  statistics 
of  whiten  signals. 

For  JADE,  the  unitary  matrix  maximize  the  criterion  : 

M 

I  Cum(  Xj ,  X* ,  x^ ,  x]^  )l^ 
i,k,l=l 

where  Cum  represents  the  fourth  order  cumulant  in  our 
special  case. 

Two  hypothesis  are  required  for  the  estimation  of  this 
matrix,  the  signals  Sj(t)  must  be  independent  and  non- 
gaussians  (we  can  separate  at  most  one  gaussian  signal). 
So  Cardoso  and  Souloumiac  proposed  [3]  a  unitary 
transform  of  whiten  signals  with  joint  diagonalization  of 
a  set  of  eigen-matrices. 


The  first  step  of  the  JADE  algorithm  is  the  normalization 
by  whitening  the  signal  part  using  2-order  statistics. 

When  a  signal  occupies  more  than  one  freedom  degree,  it 
is  very  difficult  to  estimate  the  signal  subspace 
dimension. 

For  example,  on  figure  2  we  do  not  know  if  we  have  only 
one  signal  witch  occupies  three  freedom  degrees  or  three 
narrow  band  signals. 

The  equation  (1)  is  :  x(t)  =  As(t)  +  b(t) 

where  A  is  the  mixing  matrix  of  signals  of  interest  and 
nuisance  signals. 


5  EXPERIMENTAL  RESULTS 

We  saw  that  a  wide  band  signal  tends  to  occupy  all  the 
degrees  of  freedom  of  the  covariance  matrix  and  to 
modify  the  signal  subspace  dimension. 

Moreover,  the  realistic  simulation  of  the  wave 
propagation  through  an  array  of  sensors  allows  to  take 
into  account  the  wide  band  effects  on  the  spatial-temporal 
structure  of  the  signals. 

We  consider  now  various  examples  for  simulation. 
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The  first  case  is  the  mixture  of  a  narrow-band  signal 
(BPSK  (figure  4al))  and  two  wide  band  jammers  (a 
BPSK  jammer  (figure  4a2)  and  a  gaussian  jammer  (figure 
4a3)). 

We  can  imagine  that  these  signals  occupy  three  degrees 
of  freedom  of  the  covariance  matrix.  But  in  fact,  after 
the  mixture  on  a  array  antenna,  these  signals  occupy  7 
degrees  of  freedom  of  the  covariance  matrix  (figure  4c). 

If  we  only  take  three  for  the  signal  subspace  dimension  in 
the  whitening,  the  second  stage  of  the  separation  (joint 
diagonalization)  is  not  possible. 

In  fact,  the  first  stage  of  the  JADE  algorithm  must  take 
into  account  this  signal  subspace  dimension. 

In  our  case,  we  normalize  on  a  dimension  of  order  six. 

The  figure  4b  shows  the  6  beams  formed  by  the  JADE 
algorithm  and  the  separation  of  the  narrow  band  signal 
from  the  other  components  of  the  initial  mixture. 

6  CONCLUSION 

This  paper  shows  that  it  is  possible  to  extend  the 
separation  abilities  of  blind  source  separation  algorithms 
initially  designed  for  instantaneous  mixture  to  a  more 
general  class  of  problems  where  only  the  signals  of 
interest  are  narrow-band. 

In  this  context,  we  adapted  a  specific  algorithm  called 
JADE  to  some  cases  where  the  jammers  are  wide-band. 
We  saw  on  simulations  that  this  algorithm  is  perfectly 
fitted  to  array  antenna. 
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samples  samples  samples 

Figure  4a  Figure  4b 

Source  and  jammer  signals  6  beams  with  JADE 


Figure  4c  :  Eigenvalues 
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PLENARY 


Cumulant  Tensors 


Dr.  Pierre  Comon,  EURECOM  (FRANCE) 

Cumulants  of  multidimensional  random  variables  satisfy  the  properties  that  allow  them  to  be  considered  as 
tensors.  Most  cumulant-based  signal  processing  algorithms  actually  use  slices  of  those  tensors,  mainly 
because  numerical  algorithms  available  today  are  only  able  to  manipulate  matrices.  In  addition,  very  few 
works  in  the  literature  have  addressed  the  problem  of  decomposing  or  factorizing  tensors.  This  would  be  a 
discrepancy  if  the  problem  was  not  so  difficult,  as  emphasized  in  the  talk.  However,  it  is  still  possible  to 
establish  some  lin^  with  the  Eigenvalue  decomposition,  the  congruent  diagonalization,  or  the  Cholesky 
factorization  of  matrices.  But  striking  differences  can  be  pointed  out.  It  is  hoped  that  these  first  basic 
statements  will  motivate  further  developments  of  tensor-based  algorithms. 
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ABSTRACT 

A  Generalised  Predictor  Space  representation  (of 
nonlinearity  order  two  and  memory  M)  for  nonGaussian 
and  nonminimum  phase  ARMA  processes  is  proposed 
here,  by  expanding  the  underlying  Hilbert  space  of  finite 
Lo  norm  random  variables,  which  is  now  composed  of 
linear  combinations  of  linear  as  well  as  second  order 
nonlinear  terms  of  the  process  samples.  Here  the  higher 
order  statistical  information  enters  into  the  picture  in  a 
natural  way  through  the  nonlinear  terms.  It  is  expected 
that  the  geometrical  structure  provided  by  the  proposed 
predictor  space  would  simplify  the  modeling  of  these 
processes.  A  set  of  new  innovation  vectors  is  defined  on 
this  space.  Some  of  the  properties  of  the  new  space  are 
presented.  The  finite  dimensionality  of  the  proposed 
predictor  space,  when  the  underlying  process  admits  a 
nonGaussian  and  non  minimum  phase  ARMA 
representation  is  proved.  The  application  of  the  proposed 
theory  to  estimate  nonGaussian  and  noiiminimum  phase 
ARMA  process  parameters  is  also  discussed. 


1.  Introduction: 

In  general  Higher  Order  Statistics[HOS],  in 
conjunction  with  the  second  order  statistics  provide 
additional  information  regarding  statistical  properties  of 
random  processes  and  parameters  of  the  systems,  to  what 
can  be  obtained  only  from  the  latter.  Thus  they  provide  us 
with  more  accurate  characterization  of  random  signals  and 
systems.  An  important  motivation  behind  the  use  of  HOS 
is  based  on  the  fact  that  the  second  order  statistics  yield 
only,  the  spectrally  equivalent  minimum  pha.se  factors  of 
a  nonminimum  phase  process.  On  the  other  hand,  HOS 
yield  the  true  phase  of  the  process  upto  a  linear  phase 


term  [4].  In  addition  the  use  of  HOS  can  potentially 
improve  the  estimation  accuracy,  i.e.,  reduce  the  variance 
of  the  estimated  parameters.  Also  it  is  known  that,when 
the  underlying  nonGaussian  ARMA  process  is 
nonminimum  phase,  the  optimal  predictor  in  the  mean 
square  sense  is  not  linear,  but  a  nonlinear  form  of  the  past 
data.  Here  we  present  a  generalised  predictor  space 
representation  [  of  nonlineaiity  order  q=2  and  memory  N], 
which  is  constructed  by  expanding  the  Hilbert  space 
considered  previou.sly  for  the  Gaussian  processes  by 
including  the  nonlinear  terms  also  to  take  caie  of  the 
nonGaussianity  of  the  process. 

The  organisation  of  the  paper  is  as  follows. 
Section  1  presents  a  brief  introduction  to  the  general  area 
of  HOS  and  also  the  motivation  behind  the  proposed 
representation.  Section  2  presents  the  basic  definition  of 
the  generalised  predictor  space  of  nonlinearity  order  two 
and  memory  N.  A  new  innovation  vector  is  defined  on  the 
proposed  space.  Some  of  the  properties  of  the  innovation 
vectors  are  also  presented.  This  also  contains  a  discussion 
on  p-step  ahead  predictor  in  terms  of  the  past  innovations 
and  also  on  the  similarity  of  the  proposed  representation 
with  the  well  known  Wold’s  decomposition.  In  section  3, 
we  prove  the  finite  dimensionality  of  the  propo.sed  space, 
when  the  process  admits  nonGaussian  ARMA 
representation.  The  implications  of  the  proposed  predictor 
space  approach  and  its  application  to  nonGaussian  ARMA 
parameter  estimation  are  discussed  in  section  4.  Section  5 
contains  the  conclusions  drawn  from  the  results  obtained. 

2.  Preliminaries: 

The  basic  definition  of  the  generalised  predictor 
space  of  nonlinearity  order  two  and  memory  N  is  as 
follows: 
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Let  {  y(t),  teZ  }  be  a  zero  mean  stationary 
random  sequence  with  finite  statistics  upto  fourth  order 
and  y(t)  is  not  necessarily  Gaussian,  i.e., 

E  [lyWIJ  <  «> ,  i  =  2,3,4.  (1) 

Let  us  define  some  of  the  Hilbert  spaces  associated  with 
this  nonGaussian  process  and  are  of  interest  to  us  as 
follows: 

Y*  =  Closure  Span  { 1  ,  y(t+i),  i  >  0 

y(t+i)y(t+j),  i,j  >  0,  i-N<J  <  i  }(2) 

Y;  =  Closure  Span  {  1  ,  y(t+i),  i  <  0 

y(t+i)y(t+j),  i,j  <  0,  i-N<j  <  i  }(3) 

Notice  that  Y*  and  Y,'  are  the  generalizations  of 
Hilbert  spaces  corresponding  to  the  closure  span  of  linear 
future  and  linear  past  terms  respectively,  associated  with 
the  Gaussian  case  [1],[7].  The  generalised  predictor  space 
[of  memory  N  and  nonlinearity  order  q=2]  is  defined  as 
the  Hilbert  space  generated  by  projecting  the  space  Y* 
onto  Y,'  where  (-I-)  defines  projection. 

=  V  I  y;  =  Span  {  z  I  Y;  ;  ze  Y;  }  (4) 

Assume  that  the  inner  product  defined  on  the  Hilbert  space 
for  random  variables  X  and  Y  is  E[XY].  Let 

y(t)^  { y(t).  y‘(t),  y(t)y(t-i) . .y(t)y(t-N)} 

y,  A  The  space  spanned  by  the  set  of  process  vectors  at 
time  t,  y(t). 

z,  A  The  space  spanned  by  the  set  of  innovation  vectors 
z(t). 

z(t)Ay(t)-y(t)  I  Y;  (5) 

(  Because  of’l’  z(t)  becomes  a  zero  mean  vector.) 

z(t-i)A  z(t-i)  y(t-i)  -  y(t-i)  /  Y„; 

z(t-i,t-i)  y(t-i)y(t-i)  -  y(t-i)  y(t-i)l  Y,.; 
z(t-i,t-i-l)  y(t-i)y(t-i-l))  -  y(t-i)  y(t-i-l)l  Y,.; 

z(t-i,t-i-N)  y(t-i)y(t-i-N)  -  y(t-i)  y(t-i-N)l  Y,.;  (6) 

2.1  Decomposition  of  the  observation  space: 

Theorem  1:  Y;  =  z,.,  ©  ©  ....©  z,.„  ©  T,.;  (7) 

Proof:  By  induction  on  n,  it  is  easy  to  prove  this  by  using 
the  orthogonal  decomposition  theorem. 

Corollary  1.1:  The  innovation  vectors  are  mutually 
orthogonal,  i.e., 

Span{z(t-i)}  1  Span{z(t-j)}  ,'d  i  (8) 

Proof:  Follows  from  theorem  (1). 

Corollary  12:  z(t)  is  orthogonal  to  the  projection  of  y(t-i) 
onto  Y,.J  for  all  J  >  0 

i.e.,Span{  zft)  }  1  y(t-i)  /  T,./  j>0.  (9) 


Proof:  The  proof  follows  from  the  fact  that  zft)  is 
orthogonal  to  the  space  T," :  and  Y,f  is  a  subspace  of  T/ 
for  all  j  >  0  m 

22.  Discussion: 

Theorem  1  and  its  corollaries  have  a  number  of  interesting 
implications.  These  can  be  summarised  as  follows: 

(a)  The  linear  space  T/  of  all  past  observations  and  their 
nonlinear  combinations  can  be  orthogonally  decomposed 
into  two  subspaces,  namely,  a  subspace  spanned  by  n 
immediately  past  innovations  vectors  and  a  subspace 
spanned  by  all  observations  upto  n  steps  (i.e.  Y,.„'). 

(b)  Since  y(t-i)  /  T,./  can  be  regarded  as  the  minimum 
variance  estimate  of  y(t-i)  based  on  the  observations 
spanning  the  space  Y,.j  ,  corollary  1.2  implies  that  the 
innovation  process  at  time  t  is  orthogonal  to  all  i-step 
ahead  predictors  at  time  (t-j),  with  j  >  0;  and  /  being  an 
arbitrary  integer.  This  is  a  significant  and  interesting 
result. 

We  next  consider  the  structure  of  the  process  y(t) 
in  terms  of  innovations  process. 

2.3  The  representation  of  yffj  and  its  predictors  in 
terms  of  innovations: 

( Generalised  Wold’s  Decomposition) 

Let  y(t)  be  a  discrete  time  stationary  process  and 
let  the  innovation  be  defined  as  above.  Define  E[z(t)  z(ty] 
=  G  and  define  Trace(G)  <  «>.  Then  it  can  be  easily 
shown  that  y(t)  can  be  written  as  a  weighted  linear 
combination  of  the  present  and  the  past  values  of 
innovations  vectors,  i.e,  we  have  in  the  mean  square  sense 

CO 

yf/l  =  X  //,  z(t-i)  (10) 

i=0 

where  //,  aie  appropriate  row  vectors  (lx  N+2  )  and 

oo 

(11) 

i=0 

Proof:  Using  theorem  1,  we  can  write  the  direct  sum 
decomposition  of  y„/  as: 

Y,„-  =z,@  z,.,  ©  ....©  z,.,  ©  Y„^-  (12) 

An  orthogonal  projection  of  y(t)  onto  the  space  Y,^,' 
yields: 

y(t)  I  Y,^{  =  y(t)  I  z,  +  y(t)  I  z,.,  +  .. 

••  +  y(t)  I  Z,.^  +  y<t)  I  Y,_;  (i3) 

X  I  Z  =  {X,Z){Z,Z  y  Z  (14a) 
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where  {  X,Z )  =  E[XZ] 
y(t)  I  Zm  = 


Thus  I  y(t)  I  Y,.-  I  is  a  strictly  decreasing  sequence. 
Further  noting  that  YJ 

is  a  null  space,  we  have  y( t)  /  Y.J  =  0  and  we  can  write 


where  //.  =  E[y(t)  z(t-iyj  E[z(t-i)  z(t-iy] 

From  (13)  we  get 

y(t)  I  Y,,;  =  y(t)  =HoZ,  +  H,  z,.,  +  ....  +  z,., 

+  y(()  I  Y,.p 

( V  y(0  e  y,*/ ) 

Since  by  definition 

y(t)  I  y;  =  h,  z,.,  +  ....+  z,.f  +  y(t)  I  y,.p 

and  z(t)  A  y(t)  -  y(t)  I  Y; 


we  get  Hg  =  [  1  0  0  ...  0  ] 


y(t)  I  =  y(t)-  Hg  Z/.I  Z/-P 

In  order  to  prove  the  lemma  it  is  sufficient  to  prove  that: 
Urn  ly(t)  I  Y,.^-  0  (16) 

As  p-^oo 

Using  the  result  of  theorem  (1)  again,  this  time  for  , 
we  get 

Y,.; 

Orthogonal  projection  of  y(t)  on  Y,_^-  yields: 

y(t)  I  y,.p  =  y<t)  I  Z,.,.,  +  y(0  I  Y,^i 
=  H,.,  Z,.^.  +  y(0  I  Y,.^f  (17) 

Since  ly(t)  /  Y,.p 

=  lflg.iz,.^,  +  y(0l  Y,.^/l^  (18) 

Using  corollary  (1.2)  we  can  write 

<  z(M)  .  y(t)l  Y,y  )  =  O.for  k  >  i 
from  which  it  follows  that  (18)  can  be  reduced  to 


iy(t)  I  Y,.;  =//,,;  G//,,;  + 

ly(t)IY,.^fl^ 

where  the  first  term  on  R.H.S  is  positive,  being  square  of 

a  norm,  it  follows  that 

ly(t)l  Y,.;  I'  >  ly(t)  I  y,.^,'!'  or 

ty(t)  I  Y,.^/i 
_  <7 

ly(t)  I  Y,.;i 


Lint  ly(t)l  Y,.;  1^0 

p — 

Thus  proving  the  generalised  Wold’s  decomposition 
theorem. 

Thus  we  have 

P 

Urn  ly(t)-l,  77.  z(t-i)  I  ^  -r  0 

p—^oo  i=0 

(or) 

P 

I  Lim  [  y(t)-l  Hi  z(t-i)]  0 

p— >oo  i=0 

(  since  norm  is  continuous  ) 

(or) 

we  have  in  the  mean  square  sense 

oo 

y(t)  =  I  77,  z(t-i)  ■ 

i=0 

2J.1  Discussion: 

This  is  similar  to  the  Wold’s  decomposition 

OO 

y(t)  =  I  hi  z(t-i) 
i=0 

where  z(t)  A  y(t)  -  y(t)  /  Y;  and  the  result  is  obtained  for 
Gaussian  process  in  a  lineai'  framework  and  this  can 
provide  only  a  minimum  phase  solution  of  a  nonminimum 
phase  process.  But  in  our  work  we  got  a  result  which  is  in 
a  multiinput  and  single  output  scenario;  where  y(l)  is  the 
output  of  a  multi  input  system  whose  inputs  are  the 
elements  of  the  innovation  vector  z(t-i)  and  the  resulting 
structure  can  provide  a  complete  solution  for  a 
nonminimum  phase  process[6]. 

If  //.  =  hi  [  1  0  ...0  ],  (10)  reduces  to  the  well 
known  Wold’s  decomposition  in  the  linear  Gaussian  case. 
2.3.2  Corollary  1.3  : 

The  /-step  ahead  predictor  of  y(t)  can  be  written 
as : 

i-1 

y(t+i)  I  y.+Z  =  y(t+i/t)  =  y(t+i)  -  I  77, „  z(t+i-m) 

m=0 

where  y(t+ilt)  denotes  the  /-step  ahead  predictor. 
Proof:  The  direct  sum  orthogonal  decomposition  of  y,*,*/ 
can  be  written  using  theorem  1  as: 

Y,.i,f  =  z,,i  @  z,,i.j  ©  ....©  Z,,J  ©  y„/  . 

Projecting  y(t+i)  orthogonally  onto  this  space  and  noting 
that  y(t+i)  g  y„,+/ 
we  get 
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y(i+i)  I  =  y(t+i)  =  y(t+i)  l  + 


/  z,^i  +  ....+  yit^i)  I  z,^i  +  y{tM)  /  Y,^; 


and  hence  the  result 


i-1 

y<t+i)  I  y,*i  =  y(t+ill)  =  y(t+i)  -  I  //,„  z(/+i-/nj 

w=0 

follows  as  desired.  ■ 

Corollary  1.4  : 

The  /-step  ahead  predictor  of  y(t)  can  also  be  expressed  as: 


yU+i)  I  y,*i  =  yU+Ht)  =  I  z(t-m)  (20) 

r)i=0 

Proof:  We  know  that 


y(t+i)  =  I.  Hj  z(t+i-j)  (21) 

hO 

From  (20)  we  have 


i-1 

y(t+i)  I  =  y(t+ift)  =  y(f-^i)  -  I  H„,  z(tM^m) 

m-0 


oo  i-J 

=  X  //;  zf/+/-j)  -  X  //,„  z(t^Um) 
j=0  m=0 


oo 

=  I  Hj  z(t+i-j) 

j=i 

OO 

=  I 

n=0 

Hence  the  result  (20).  ■ 

2.3.2.1  Discussion; 

The  results  of  corollaries  1.3  and  1.4  relate  the 
step  ahead  predictors  and  prediction  errors  to  the  past 
innovations.  Equation  (20)  expresses  the  /-step  ahead 
prediction  error  [y(t+i)  -  y(y+ilt)]  as  a  weighted  sum  of 
the  ’/'  immediately  past  innovations  with  { 
as  the  weighting  coefficient  vectors. 

Corollary  1.4  gives  a  representation  of  the  /-step 
ahead  predictor  in  terms  of  past  innovation  vectors.  Note 
the  interesting  similaiity  of  the  representation  of  y(t)  in 
(10)  and  of  y(t+i/t)  in  (21). 

The  most  important  and  useful  feature  of 
corollary  1.3  is  that  unlike  in  corollaiy  1.4,  this  result 
enables  us  to  express  the  /-step  ahead  predictor  in  terms  of 
a  finite  dimensional  basis.  More  specifically,  the  /-step 
ahead  predictor  is  obtained  here  subtracting  from  y(t+i)  , 
the  information  not  contained  in  the  infinite  dimensional 
space  y„/. 


3.  Finite  dimensionality  of  the  generalised 
predictor  space: 

With  the  results  of  the  earlier  section  at  our 
disposal,  in  this  section  we  will  prove  the  finite 
dimensionality  of  the  generalised  predictor  space,  if  the 
process  admits  a  linear  nonGaussian  ARMA 
representation. 

Theorem  2:  Let  y(t)  be  an  ARMA  process  of  order  (p,q), 
then  the  predictor  space  where  N=max(p,q)  is  finite 
dimensional  and  is  spanned  by  the  predictors  [  y(t-l)  / 
Y,n.  y(t-2)IY,.^\  y(t-3)  I  ’...,  y(t-N)  /  y,V  }, 
Where  y^  =1. 

Proof: 

y(t)  +  a,  y(t-l)  + . +  y(t-p)  =  w(t)  +  bj  w(t-l) 

+ . +  wU-q)  (22) 

where  w( t)  are  i.i.d  nonGaussian  white  process. 

A.ssuming  p>q,  N  =  max(p,q)  =  p 
Projecting  both  sides  of  (22)  onto  7,.^'  we  get 


y(t)  I  y,.N  +  fl/  y(f-2)  I  yi.N  +  y(t-p)  /  y,.,; 

=  w(t)  I  Y,.^-  +  b,  w(t-l)  I  y,V  +  ..+b^  w(t-p)  I  Y,.^-(23) 
Clearly  R.H.S  of  (23)  =0. 

By  using  induction  principle  we  can  show  that 

y(t+i)  I  Y,.fl  =  Linear  combination  of  [  y(t-l)  /  7,.^'  , 
. .y(t-p)  I  Y,.^'  ]  for  any  i>0. 

Multiplying  by  y(t-i)  on  both  sides  (  i  >  N+1  )  ,  we  get 
y(t)y(y-i)  +  a,  y(f-l)y(t-i)  +...+  a^  y(t-p)y(t-i) 

w(t)y(t-i)  +  bj  w(t-l)y(t-i)  +...+  b^  w(t-q)y(t-i)  (24) 
Projecting  both  sides  onto  Y,.^'  we  get 
P 

I  a,  [y(t-j)y(t-i)]  I  y,.^,-  =  o,  for  i  >  N+l,ao  =0. 
j=0  (25) 

Considering  (24)  for  i  <  N  and  by  using  the  relation 

OO 

y(t)  =  I  //*  w(t-k) 
k=0 

(where  //*  are  impulse  response  coefficients) 
we  get 

P  q 

X  aj  [y(t-j)y(t-i)J  I  Y,.^-  =  I  bj  [w(t-j)y(t-i)]  /  7,.^- 
j=0  J=0 

q 

=  I  bj  Mt-j)y(t-i)]  I  y,.^- 

q 

=  I  />,  hj..  =  (26) 

j^i 

(  where  kg  are  constants) 
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Combining  (25)  and  (26)  we  get 

y(t)y(t-i)  I  y,.n  +  a,  y(t-l)y(t-i)  I  Y,.^'  + . 

+ap  y(l-p)y(t-i)  /  Y,.^'  -  /  Y,.^;  =  0,  V  /.  (27) 

where  kj  are  some  arbitrary  constants  and  are  equal  to 
zero  for  all  /'  ^  N+1. 

Using  (27)  we  can  show  that  all  the  predictors  formed  by 
the  nonlinear  terms  i.e.,y(t+i)y(t+i))  I  Y,.f,',  y(t+i)  y(t+i-l) 

I  Y,.s . .  y(t+i)y(t+i-p)  /  Y..^'  for  all  i  >  0,  can  be 

expressed  in  terms  of  the  assumed  basis  of  P,.^j/ ,i.e.,y(t-l) 

I  Y,.^\  y(t-2)IY,.f.  y(t-3)  I  Y,.f,  '....y(t-N)  /  . 

This  completes  the  proof.  ■ 

4.  Application  to  nonGaussian  ARMA 
processes: 

Here  we  present  a  result  for  representation  of 
nonGaussian  ARMA  processes,  which  will  be  useful  in  the 
estimation  of  parameters  of  the  process;  and  the  relevant 
results  will  be  published  elsewhere.  It  is  expected  to 
provide  consistent  estimates  of  the  AR  parameters,  as  it 
makes  use  of  more  than  p+1  slices  of  higher  order 
statistical  terms  due  to  the  expanded  space[2]. 

Theorem  3: 

If  y(t)  admits  the  nonGaussian  ARMA(N,N) 
representation, 

y(t)  +  a,  y(i-l)  +....+  a^,  y(l-N)  =  w(t)  +  b,  w(t-l) 

+....+  bf,  w(t-N) 

Then  y(t)  admits  the  following  representation: 

y(t)  +  aj  y(t-l)  + . +  y(t-N)  = 

Bg  z(t)  +8,  z(t-l)+....+  Bfi  z(t-N),  (28) 

where 

z{t4)  is  as  defined  in(6)  and  N  is  the  memory  of 
nonlinearity  in  the  predictor  space  representation. 

Proof: 

yit)  +  a,  y(t-l)  + . +  y(t-N)  =  w(t)  +  b,  w(t-l) 

+ . +  w(t-N) 

Projecting  both  sides  onto  Y,.fj'  we  get 

y(t)  I  y,.w  +  fli  y<t-i)  I  y,.n  +  y(f-p)  I  y,.n 

=  0 

By  writing  the  predictors  in  terms  of  the  innovations  z(t-i) 
we  get  the  desired  result.  Where 
Ho  =Bo 
H,  +  a,Ho  =  B, 

Hfj  +  0/  Hfi.i  +  fli  Hf^.2  + . +  %  Ho  = 

This  results  in  the  desired  expression  (28).  ■ 


5.  Conclusions: 

An  approach  to  representing  nonGaussian/ 
nonminimum  phase  random  processes  has  been  proposed. 
The  results  are  proposed  for  second  order  nonlinearity  for 
ease  of  presentation;  though  the  generalisation  of  the 
theory  to  pth  order  nonlinearity  is  straight  forward. 
Application  of  this  theory  to  the  parameter  estimation  of 
nonGaussian  ARMA  process  is  being  investigated.  We  can 
also  expect  better  estimates  using  this  extended  space  as 
we  are  able  to  utilise  more  statistical  information  of  the 
process.  A  result  to  this  effect  has  been  given  by  Bondon 
et.al[3].  A  more  generalised  version  of  the  representation 
for  which  the  present  case  forms  a  special  case  is  also 
obtained,  the  results  of  which  are  to  be  published 
elsewhere. 
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Abstract 

This  communication  presents  a  necessary  and  sufficient 
condition  for  the  factorizability  of  higher  order  spectra 
of  complex  signals.  This  condition  is  based  on  the 
symmetries  of  higher  order  spectra  and  on  an  extension 
of  a  formula  proposed  by  Marron,  Sanchez  and  Sullivan 
for  unwrapping  phases  of  third  order  spectra  [1].  It  is 
an  identity  between  products  of  higher  order  spectra. 
Our  factorisability  test  requires  no  phase  unwrapping. 

[  1]  J.  C.  Marron  ,  P.  P.  Sanchez  and  R.  C.  Sullivan 
(1990):  "Unwrapping  algorithm  for  least-squares  phase 
recovery  from  the  modulo  271  bispectrum  phase,”  J.  Opt. 
Soc.  Am.  AIVol.7,  pp  14-20,  January  1990.) 


1.  Introduction 

The  problem  of  higher  order  factorizability  has  been 
studied  by  several  researchers.  Tekalp  and  Erdem  [13] 
and  Pan  and  Nikias  [10],  base  their  development  on  the 
higher  order  “cepstrum”  and  check  whether  the  support 
of  this  cepstrum  reduces  to  a  precise  subspace  or  not. 
The  difficulty  in  their  approach  is  related  to  the  multiple 
determination  of  a  complex  number  logarithm.  This 
implies  the  use  of  a  phase  unwrapping  procedure.  Dianat 
and  Raghuveer  [6]  propose  an  approach  that  is  valid 
only  when  MA  modelling  at  a  given  order  is  possible. 
Alshebeili  [1]  has  proposed  a  condition  in  the  form  of  a 
LDU  decomposion  of  the  cumulant  matrix  (at  order  3). 
The  three  kinds  of  methods  show  the  validity  of  the 
factorisability  essentially  in  computing  functions  closely 
related  to  the  higher  order  spectral  factor  i.e.  the  values 
of  the  system  impulse  response. 

For  a  different  purpose,  Marron,  Sanchez  and 
Sullivan  [9]  have  given  a  recursive  formula  performing 
phase  unwrapping  of  third  order  spectra.  Their  method 
is  based  on  a  relation  between  different  values  of  a 
linear  combination  of  the  third  order  spectrum  phases.  It 
does  not  perform  an  implicit  computation  of  the  higher 
order  spectral  factor. 


In  this  communication  we  intend  to  show  that 
Marron,  Sanchez  and  Sullivan  (MSS)  formula  can 
actually  be  considered  as  a  basis  for  developping  a 
necessary  and  sufficient  condition  for  factorizability  : 
(1)  we  generalize  MSS  third  order  formula  to  the  case  of 
higher  order  spectra  of  complex  signals ;  (2)  we  give  the 
expression  of  Marron's  formula  generalization  directly 
in  function  of  the  higher  order  spectrum  and  not  of  the 
higher  order  spectrum  logarithm  (modulus  logarithm 
and  phase)  ;  the  formula  is  an  identity  between  products 
of  higher  order  spectra  in  the  frequency  domain ;  (3)  we 
show  that  this  generalization  of  Marron's  formula  is 
equivalent  to  the  expression  of  factorizability  in  the 
cepstral  domain  [13]  provided  that  the  higher  order 
spectral  symmetries  are  satisfied  ;  this  gives  the 
necessary  and  sufficient  conditions  for  factorizability : 
(4)  we  show  that  this  formulation  in  the  frequency 
domain  does  not  raise  a  phase  unwranning  nnoblem- 

2.  A  generalization  of  MSS  formula 


We  consider  the  third  order  spectrum  Sj(u,v)  of  a 
non  gaussian  real  random  process, 

53(m,v)=  £{X(m)X(v)X(-«-  v)} .  (1) 


When  Sj(u,v),  is  factorizable  and  if  its  phase  is 

correctly  unwrapped,  the  following  identity  is  satisfied 

[9]: 

fP3(u+W,v)+  ^3(u,w)=  1p3(u  +  V,w)+  V^3(u,v)  (2) 
where  y/^(u,v)  is  the  phase  of  S3(«,v).  This  formula 

can  be  generalized  as  shown  below  (a  detailed 
development  is  given  in  [7].) 

We  consider  complex  signals  N  -th  order  spectra, 
Fourier  transforms  of  cumulants, 

5;^  (Wj , . . . ,  (Wi , . . . ,  ) 


expjy/n(wi,...,Wf,_i). 

If  factorizability  is  satisfied,  then 


5yy(Wi,...,Wyv_,)  = 


N/2 

n"K) 

/.=! 


(3) 

(4) 
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X  i~^q)  ^ 

_?=W/2+l  J  '■=1 

where  H{w)  is  the  TV  -th  order  spectral  factor.  When  (4) 
is  satisfied  the  spectral  fxtor  //(w)  is  unique  up  to  a 

linear  phase  term  of  the  form  Eq.  (4) 

generalizes  expressions  found  in  [1 1][12],  as  done  in  12J 
with  a  different  choice  of  the  variables.  The  expression 
of  (4)  given  in  terms  of  phases  is 

NI2 

. 

p=i 

N-l 

q=SI2+l  '■=1 

where  ^(w)  is  the  phase  of  //(w) . 


Fig.  1.  MSS  formula  at  order  4 

When  (5)  holds,  the  unwrapped  higher  order 
spectrum  phases  satisfy : 

N-2  „ 

p=\ 

N-2 

V'Ar(“+  '^Vp,Wi,.,W^,2)+VN(^''^i'->'^N-2)- 
p=\ 

(See  the  illustration  in  fig.  1).  The  result  is  verified  by 
direct  development.  A  similar  equation  holds  in  the  case 
of  the  modulus  logarithm  (note  the  terms  in  logP2)' 

N-2 

logPAf(«+ 


+  10gPA?(u,Wj,...,Wftr_2)-10gP2(“+ 

p=l 

N-2 

=  10gPAr{u+  X ''p 

p=l 

N-2 

l0gp^(u,V, . Vft,_2)-l0gp2(“+ 

p=l 

Consequently,  we  have  the  following  identity  which 
holds  even  when  the  phase  is  not  unwrapped  (this 
identity  is  verified  by  direct  development) : 

N-2 

Sn{u+  'Y^Wp,V^,...,Vff_2)^S^U,W^ . H’a/_2) 

_ _  _ _  =: 

- - - 

52(«+ 

p-\ 

N-2 

Sfiiu+  . . ''^'-2^ 

P=1 - - - (8) 

52(«+  Y.Vp) 

p=l 

3.  Symmetries  of  complex  signals  higher 
order  spectra 

The  factorizability  condition  is  based  on  MSS 
formula,  but  also  the  symmetries  of  complex  signals 
higher  order  spectra.  Studies  on  these  symmetries  can  be 
found  in  [3][4][8].  Here  we  give  the  matricial  PPgmors 
associated  with  the  transformations  of  the  space  ^t 
keep  the  higher  order  spectrum  invariant.  We  consider 
only  even  orders.  We  have  the  two  types  of  operations 
(A/  =  N/2-1): 

(a)  The  permutations  of  the  {M  + 1)  first  variables  or  of 
the  M  last  variables  do  not  modify 

Sa,(u,Vi . vm,Wi,...,Wm).  Any  of  the  {M  +  iy.M\ 

matrices  of  the  form 

P  =  M,  (9) 

[  0  Qm\ 

where  Q^+i  and  Qm  are  permutation  matrices  of 
dimension  (M  +  1)  andM  respectively  keep  the 
higher  order  spectrum  invariant. 

(b)  The  following  change  of  variables : 

M’  =  a  +  v,+...+v^  +Wi+...+w*/, 

vi'  =  -w,,....v;^*  =  -wj^,  00) 

W^'  =  -Vi,...,W^'  =  -Vu, 

transforms  the  higher  order  spectrum  in  its  complex 
conjugate.  As  a  consequence,  when  applied  to  the 
variables,  the  operator  T  , 
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0  ‘  0 

1 

0  1-1 

0 

0 

...  0 

i  1  i 

i  1  0 

-1 

0 

...  0 

1 

:  1  : 

i  1  0 

0 

-1  0 

o!  0 

. 

0  I  0 

JL, 

0  -1 

otlf 

0  0  - 

‘oTo‘ 

...  0 

i  1  0 
.  1  . 

-1  0  - 

0  1  : 

1  : 

:  1 0 

-  0  -1 

o!  : 

0  1  0 

-  0  0 

-1 !  0 

...  0 

keeps  the  higher  order  spectrum  invariant  (if  we  except 
the  complex  conjugation).  One  can  verify  that  products 
of  matrices  of  type  /*  and  T  form  a  finite  goup  that 
keep  invariant  the  subspace  support  of  higher  order 
spectra  of  band  limited  stationary  signals. 

4.  A  necessary  and  sufficient  condition  for 
factorizability 

Theorem  :  is  factorizable  in  the 

form  (4)  if  and  only  if  (8)  is  satisfied  for  a  nonzero  value 

of  (v,,...,vy^_2)  and  5^(«i . “at-i)  satisfies  the 

higher  order  spectral  symmetries  of  section  3. 

The  choice  of  the  vector  (vj,...,v^_2)  requires 
some  care:  if  is  periodic,  its  period 

must  be  different  from  a  multiple  of  If 

5/vf  (uj  )  is  periodic  and  discrete,  which  is  often 

the  case  in  applications  based  on  the  DFT,  it  is  always 
possible  to  choose  (vi,...,vjv_2)  =  (Av,0,...,0)  where 
Av  is  the  frequency  resolution. 

Proof  :  The  fact  that  the  condition  is  necessary  was 
shown  in  section  2. 

4.1.  Proof  of  the  condition  on  the  phases  when  they 
are  correctly  unwrapped 

Symmetry  (10)  implies  that  = 

If  we  know  a  correct  (for  example  the  continuous) 
determination  of  the  higher  order  spectrum  phase,  and 
not  only  its  determination  in  the  range  [-Jt,  Jt[,  we  have 

N-2 

¥n(‘^+  'LWp,Vi,.,Vf^_2)+  ¥N(u,Wi,.,Wf,_2)  (12) 

P=1 

^~2 

=  ¥n(u+  'Z^p,Wi,.,Wfj.2)+¥s(U,Vi,.,Vf,_2). 

p=\ 

(a)  expression  of  the  higher  order  spectrum  phase  as  a 
sum 

When(v,,...,v;y_2)  is  fixed,  the  difference  between 
the  two  function  of  (N-l)  variables  in  (12), 


N-2 

’Ww-z)  and  ¥n(m,w^ . 

p=i 

is  the  difference  between  two  functions  of  one  variable  : 

N-2 

¥Niu,Wi,.,WK.2)-¥N(u+  XVp,Wi,.,H'Ar_2)=  (13) 

P=l 

N-2 

p~\ 

If  is  a  periodic  function  of  w,  we 

suppose  that  we  have  chosen  a  vector 

N-2 

such  that  t-  and  its  multiples  are  different  from 

p^\ 

its  period ;  otherwise,  MSS  formula  could  not  be  used  to 
check  the  factorizability.  In  order  to  take  advantage  of 
this  property  ,  we  decompose  V^a^(u,Wi,...,w^^2)  ^  ^ 
sum : 

^yv{w.>Vi,...w^_2)  =  /(w) 

N-2 

— ^N-2)»  (14) 

p=l 

where  h(UyWi,,„,w^_2)  holds  no  additive  terms 

N-2 

depending  only  on  w  or  (w  +  X 

(bl  /»(«■  wi . )  does  not  depend  on  u 

Using  (14)  in  computing  the  difference  (13)  yields : 
V'Ar(«,Wi,...,W;v_2)- v^a^(u  +  4h',,...,wa,_2)  (15) 

N-2 

=  f  (u)  +  g(u+  + 

p=l 

N-2 

-f  (.u+e)+g{u+e+ 

p=l 

+  /i(w+4Wj,.,W^_2) 

N-2 

=  lf'w(«.Vi,.,V/v_2)-  V^v(«+ 

p:=l 

N-2 

with  The  last  line  of  (15)  holds  only  two 

p=\ 

N-2 

terms  functions  respectively  of  u  and  {u  + 

P=1 

Consequently, 

h{u,w^,.,Wfi_2)  =  h(M  +  i,w^,.,wn_2).  (16) 
If  (vi,...,V/^,_2)  is  chosen  appropriately, 
Vrjv(i<,wi,...,w;v_2)  is  not  periodic  of  period  (.  in  «, 
and  neither  is  h(«,H',,...,H'/^_2).  So,  this  last  function 
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does  not  depend  on  u : 

/^(u,w,,...,w^^,_2)  =  A'(w,,...,W7^,_2).  (17) 

and  (14)  becomes 

V'w(«.Wj,...,ii'^_2)  =  /  («) 

+  giu  +  ^>t'p)+/l'(Wi,...,Wjv_2).  (18) 
p=i 

tcl  Application  of  the  symmetries 

Now  we  can  apply  the  changes  of  variables  of 
section  3,  that  keep  the  higho-  order  specinim  invariant. 
First  the  permutation  of  u  and  any  of  the  first 
(N 12-1)  variables  yields: 

h' (H'i,.,Wp,.,W;V/2-l«’*'W/2»*»’*'W-2)  “  /  (1^) 

+  h'  ( Wj , . ,M, . ,  W^/2-1 » ^S/2 » •  •  '*'W-2)  “  /  ’ 

SO  that 

A’  ( Wi , . . . ,  W'yy,/2-l .  WAT/a  *  • — Wa/-2  )  =  (20) 

NI2-1 

psl 

If  we  use  the  transfomiation  T  , 

7V-2 

A  (21) 

P=N12 

and 

g(u)  =  -f(u).  (22) 

Finally 

Vryy(«,w,,..,w^_2)  =  /  (u)+  (23) 

W/2-1  JV-2  A^2 

S/(H'p)-  X/(-'^p)-/(“+  X’»';>)> 

p=l  p=W/2  ;>=! 

which  expresses  the  factorizability. 

4.2.  On  the  computation  of  the  correct  phase 
determination  (phase  unwrapping) 

If  (12)  is  satisfied,  we  have  the  expression  (23)  of 
provided  that  y/N0^>^i’"‘'''^N-2) 
is  a  correct  determination  of  the  HOS  phase  in  the  range 
[_4n,4jt[.  We  intend  to  show  that  it  is  always  possible  to 
compute  the  correct  determination  of 
\lf^{u,wi,...,Wf^_2)  from  the  one  that  is  known 

Vr^(u,Wi,...,W;v-2)(gi'^®"  ™  1**®  ^8®  [-it.’tD: 

VrAr(«.w'i.-.WAr_2)  (24) 

=  V^^(M,Wi,.,w/^.2)+27:m(M,Wi..„,w^_2), 

where  m  is  an  integer.  We  consider  the  values  of 
Vr^(M,v,0,...,0)  (the  third  order  spectum  phase).A 
subset  of  these  data  can  be  used  to  reconstruct  the 
spectral  factor  phase  /(«)  using  a  recursive 
multiresolution  algorithm  without  raising  phase 


unwrapping  difficulties  [8]  .  The  recursive  structure  of 
the  algorithm  allows,  in  theory,  the  reconstruction  of 
/(«)  for  all  «  at  any  frequency  resolution.  When  /(«) 
is  known,  in  using  (5),  it  is  possible  to  compute  the 

correct  determination  (24)  of  ^^(u,wi . w^_2) 

satisfying  (6).  (Besides,  this  was  the  initial  object  of 
MSS  recursion) :  Starting  from  the  HOS  data  used  in 
this  multiresolution  method,  there  is  only  one  manner  of 
reconstructing  recursively  y/ff(u,W'^,...,Wf/_2) 
satisfying  (6).  So,  as  the  phase  reconstructed  through  the 
computation  of  /(«)  satisfies  (6),  it  gives  the 

determination  satisfying  (24). . 

Consequently  when  (8)  is  satisfied,  it  is  always 
possible  to  reconstruct  the  determination  of  the  phase 
allowing  the  development  of  section  4.1  since  the 
spectral  factor  is  unique.  The  knowledge  of 

Vr^(u,Wi,...,H'jv_2)  in  the  range  [-7t,jr[  is  sufficient. 

4.3.  Factorizability  of  the  higher  order  spectrum 
modulus 

A  similar  development  applies  to  the  modulus 
logarithm.  When  the  HOS  symmetries  and  (7)  are 
satisfied,  the  modulus  logarithm  can  be  written  in  the 
form 

l0gp/^(«,Wj,...,H'jv_2)=  A(m)  +X{U  +  Wi+...+Wf,_2) 

+A(wi)+...+2.(M';y,/2-i)  (25) 

+  X(,—Wf//2)+--+^(—^N-2)' 

When  (7)  is  satisfied, 

2A(u)  =  loglS2(u)l,  (26) 

which  gives  the  modulus  of  the  spectral  factor.  This 
implies  the  factorizability  of  the  modulus. 

Consequendy,  if  (8)  and  the  HOS  symmetries  hold, 
Sjv(m.h'i,...,wa,_2)  is  factorizable. 
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Abstract 

This  paper  concerns  the  estimation  of  the  parameters 
that  describe  a  stable  distribution.  Stable  distributions  are 
characterised  by  four  parameters  which  can  be  estimated 
using  a  number  of  methods  and  although  approximate  max¬ 
imum  likelihood  estimation  (MLE)  techniques  do  exist,  they 
are  computationally  intensive.  There  are  a  number  of  tech¬ 
niques  that  are  much  faster  than  MLE  and  these  are  the  focus 
of  this  paper.  These  techniques  are  compared  and  contras¬ 
ted  both  for  stable  random  variables  and  for  teletraffic  data. 


1  Introduction 

A  wide  range  of  modelling  areas  that  encompass  eco¬ 
nomics,  telecommunications,  channel  noise  modelling  and 
signal  interpolation  have  received  treatment  with  stable  dis¬ 
tributions  (see  section  1.4  in  [10]).  The  reasons  for  this 
are  that  the  environment  to  be  modelled  has  tended  to  be 
more  impulsive  than  the  Gaussian  case  and  the  development 
of  more  powerful  computers  has  made  working  with  stable 
processes  a  more  realistic  proposition. 

The  lack  of  closed  form  expression  for  the  pdf  of  almost 
all  stable  distributions  makes  MLE  techniques  impossible 
and  it  was  only  in  1971  that  DuMouchel  developed  an  ap¬ 
proximate  MLE  technique  [2].  However  this  technique  is 
computationally  intensive  and  does  not  lend  itself  to  the  real 
time  estimation  of  stable  parameters.  We  wish  to  concern 
ourselves  with  the  quicker,  more  approximate  techniques 
that  have  been  developed. 

This  paper  begins  by  introducing  the  salient  properties  of 
the  stable  distribution.  It  then  goes  on  to  discuss  the  meth¬ 
ods  we  consider  in  this  paper.  The  estimation  of  random 
variables  drawn  from  a  truly  stable  distribution  is  the  focus 
of  section  4  and  this  allows  us  to  compare  how  different 
techniques  perform  when  the  actual  parameter  values  are 
known.  In  section  5  we  go  on  to  estimate  stable  parameters 
for  compressed  video  data.  In  this  case  the  correct  values 
for  the  stable  distribution  are  not  known.  However  we  use 


an  inverse  Fourier  transfer  technique  to  determine  the  ac¬ 
curacy  of  each  techniques  estimates.  Finally  we  finish  with 
a  number  of  conclusions  and  in  appendix  A  we  give  details 
of  how  the  software  used  in  this  paper  may  be  obtained. 

2  Stable  Distribution  Parameters 

All  stable  distributions  can  be  uniquely  expressed  by  their 
characteristic  function  which  is  given  by  (p(f)  =  exp{jat  — 
|-yfl“[l  -I-  j/?sign(f)a;(f,  a)]},  where,  w(f,  a)  =  tan(a7r/2) 
for  a  ^  1  and  (2/w)  log  |f  |  for  a  =  1  and  sign(f)  =  1  for 
f  >  0, 0  for  t  =  0  and  -1  for  t  <  0. 

The  parameters  a,  a,  /?  and  7  describe  completely  a  stable 
distribution,  a  (—00  <  u  <  00)  is  the  location  parameter,  a 
(0  <  a  <  2)  is  termed  the  characteristic  exponent,  ^  (— 1  < 

<  1)  is  the  index  of  skew  and  7  (0  <  7  <  00)  is  termed 
the  dispersion  parameter.  Some  estimation  techniques  find 
a  value  for  c  which  is  equal  to  7  « . 

In  almost  every  case  stable  distributions  do  not  have  a 
closed  form  probability  density  function  (pdf).  The  only 
exceptions  to  this  are  the  case  when  a  =  2  (Gaussian), 
a  =  1,  /?  =  0  (Cauchy)  and  a  =  0.5,  /?  =  -1  (Pearson). 
For  more  details  about  the  properties  of  stable  distribution 
see  [10]  and  [11]. 

3  Estimators 

The  techniques  under  consideration  in  this  paper  fall  into 
three  categories.  Firstly  there  are  those  that  perform  their 
estimates  based  on  the  quantiles  of  the  distribution  [9].  Then 
there  is  the  technique  developed  by  Kogon  which  employs  an 
empirical  estimate  of  the  characteristic  function  [4].  Finally 
there  are  the  fractional  lower  order  moment  (FLOM)  based 
methods  developed  by  Nikias,  Ma  and  Tsihrintzis  [8],  [12]. 
In  this  section  we  briefly  introduce  these  techniques  and 
finish  with  a  summary  of  their  valid  estimation  ranges. 

3.1  The  quantile  based  method 

Estimation  techniques  based  on  the  quantiles  of  an  em¬ 
pirical  sample  were  first  suggested  by  Fama  and  Roll  [3]. 
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However  their  technique  was  limited  to  symmetric  distribu¬ 
tions  and  suffered  from  a  small  asymptotic  bias.  McCulloch 
developed  a  technique  that  uses  five  quantiles  from  a  sample 
and  estimates  a,  /?,  c  and  a  over  a  wider  range  of  the  para¬ 
meter  space  without  asymptotic  bias  [9], 

3.2  The  characteristic  function  based  method 

Stable  distributions  can  be  uniquely  defined  by  their  char¬ 
acteristic  function  (the  Fourier  transform  of  their  pdf).  This 
implies  that  an  empirical  approximation  of  the  function  may 
be  used  to  estimate  the  stable  parameters  for  that  sample. 
It  was  Koutrouvelis  who  first  developed  a  method  that  took 
advantage  of  this  approach  [5],  [6].  However  in  [4]  Ko- 
gon  develops  a  modified  approach  that  yields  better  results 
in  almost  all  cases  and  reduces  the  amount  of  computation 
required. 

3.3  The  FLOM  based  methods 

A  stable  random  variable  with  characteristic  exponent 
1  <  a  <  2  has  infinite  variance  but  finite  mean.  If  the 
random  variable  has  a  <  1  then  both  the  mean  and  variance 
are  infinite.  Moregenerally  wecansay  E(|X|^)  =  oo  ifp  > 
a,  <  oo  if  p  <  a.  We  can  therefore  use  the  behaviour  of 
these  FLOMs  to  determine  an  estimate  for  a.  In  [8]  Ma  and 
Nikias  present  the  log  FLOM  method  and  in  [12]  Tsihrintzis 
and  Nikias  present  the  extreme  order  method. 

3.4  Summary 


Table  1  summarises  the  permissible  range  of  parameter 
values  for  each  of  the  estimation  methods.  Parameters  that 
can  be  estimated  by  a  given  technique  are  marked  with  a  *. 


1  Name  | 

p 

7 

a 

■CESlfl 

(-1.1)* 

(0, 00)  * 

(—00,  00)* 

Kogon 

WBBBM 

■iilM 

Ma 

mssm 

1  (0)  1 

Table  1 .  Permissible  ranges  of  estimation. 

4  Estimating  parameters  for  stable  random 
variables 


In  this  section  we  test  the  accuracy  of  the  estimation 
techniques  when  the  data  is  drawn  from  a  stable  distribution 
with  known  parameter  values.  In  all  the  cases  in  this  section 
we  performed  the  estimation  for  1000  independent  sets  of 
data,  each  of  length  1000,  and  recorded  the  mean  and  the 
95%  confidence  interval  of  the  estimate. 


4.1  Generating  stable  random  variables 

We  generated  the  stable  random  variables  using  a  method 
based  on  the  work  of  Chambers,  Mellow  and  Stuck  (CMS) 
[1]  and  reproduced  in  [1 1]. 

4.2  Results 

The  first  results  consider  the  case  of  estimating  a  when 
/3  =  0,c  =  1  and  a  =  0.  Figure  1  plots  the  mean  and  95% 
confidence  interval  for  a  for  the  four  estimation  techniques. 

The  next  series  of  results  is  concerned  with  the  estimation 
of  a  and  /?  for  series  of  stable  random  variables  drawn  from 
a  skewed  distribution.  We  only  used  the  results  from  those 
estimators  capable  of  estimating  /?  i.e  The  estimators  of 
Kogon  and  McCulloch.  We  have  plotted  the  results  for 
/?=-!,  -0.5, 0.5  and  1  in  figures  2-5. 

Next  we  wished  to  consider  the  effects  of  mis-applying 
a  symmetric  estimator  to  an  asymmetric  series.  In  this  case 
we  used  the  FLOM  based  techniques  to  estimate  a  from  data 
drawn  from  a  skewed  distribution  (/?=-0.5  and  0.5,  figure  6). 

To  investigate  the  estimation  of  j  we  generated  data  and 
then  scaled  it  using  the  mapping  X  X  ^  k.  Since  the 
CMS  algorithm  generates  random  variables  with  7  =  1  the 
expected  estimate  of  the  scale  parameter,  c  (c  =  7®)  is 
k.  We  considered  the  symmetric  case  for  iir=0.1  and  k=10 
(figures  7  and  8). 

4.3  Discussion 

There  are  many  points  that  can  be  made  about  the  results 
in  section  4,2,  the  main  ones  we  wish  to  make  are  listed 
below. 

1.  In  figure  1  McCulloch’s  method  performs  best  when 
a  >  0,5  (as  expected)  with  a  slight  drop  in  the  accur¬ 
acy  of  d  as  a  2. 

2.  Tsihrintzis’s  method  tends  to  overestimate  a  although 
it  gives  very  similar  results  for  ^  =  0  and  p  —  ±0.5. 
This  is  unexpected  as  the  method  is  based  on  the 
assumption  of  symmetry. 

3.  Kogon’s  estimation  technique  tends  to  perform  better 
then  McCulloch’s  over  the  range  a  E  (0, 2],  /?  E 
[—1, 1],  c  =  1  and  a  ==  0  (figures  2  -  5). 

4.  The  method  of  Ma  performs  just  as  well  at  estimating 
Oi  when  (3  =  ±0.5  as  when  /?  =  0.  This  suggests 
that  the  FLOM  techniques  may  be  more  robust  to  the 
value  of  /3  than  previously  thought. 

5.  Kogon’s  c  are  very  poor  when  k  =  10  but  improve 
when  k  =  0,1.  This  is  due  to  the  flattening  effect 
larger  7  have  on  the  characteristic  function. 


391 


6.  Tsihrintzis’s’  c  are  overbiased  when  k  ^  10  and  are 
underbiased  when  k  =^0. 1 . 

1.  McCulloch  and  Ma's  estimates  for  c  are  superior  than 
the  other  two  methods.  Ma  slightly  outperforms  Mc¬ 
Culloch  over  the  range  a  E  (0, 2]. 

5  Estimating  parameters  for  real  data 

In  this  section  we  turn  our  attention  to  attempting  to  de¬ 
termine  stable  parameters  for  real  data,  to  which  we  are 
attempting  to  fit  a  stable  distribution.  One  problem  is  de¬ 
termining  whether  the  estimated  parameters  produce  a  distri¬ 
bution  which  matches  the  data  well.  To  test  this  we  perform 
a  goodness  of  fit  test  between  the  estimated  pdf  of  the  data 
and  the  pdf  of  the  estimated  stable  distribution.  We  have 
already  mentioned  that  no  closed  form  expression  for  the 
pdf  exists  in  most  cases  of  stable  distribution  but  we  can 
determine  it  by  calculating  the  inverse  Fourier  transform  of 
the  characteristic  equation  (section  2.3  in  [10]). 

5.1  The  video  data 

The  video  data  consists  of  171000  values  representing 
the  numbers  of  bits/frame  for  a  JPEG  encoded  version  of 
the  movie  Star  Wars.  The  salient  statistics  of  the  data  are 

N  =  171000  (T  =  6254.2  min  =  8622 
^  =  27791.2  max  =  78459 

5.2  The  Ethernet  data 

The  packet  data  was  generated  from  a  well  studied  Eth¬ 
ernet  trace  collected  at  Bellcore  Labs  [7],  From  this  trace 
we  constructed  the  “activity”  per  second  data  set  (measured 
in  bytes/s).  We  then  differenced  this  set  to  obtain  a  distribu¬ 
tion  that  was  less  one  sided  and  more  suitable  for  estimation. 
The  statistics  of  the  data  are 

N  =  3143  cr  =  79018  min  =  -377870 
fjL  =  24.3  max  =  395970 

5.3  Estimation  procedure  and  results 

In  section  4.2  we  discovered  several  important  points 
about  the  performance  of  the  estimation  techniques.  One 
point  is  that  Kogon’s  technique  gives  good  a  and  fi  estim¬ 
ates  but  only  when  c  is  close  to  1.  Therefore  McCulloch’s 
technique  was  applied  first  to  the  real  data  and  it  was  scaled 
using  c  and  d.  Since  we  found  evidence  to  suggest  the 
FLOM  methods  work  well  even  when  p  0  applied 
them  to  the  data.  Finally  Kogon’s  technique  was  applied 
and  all  the  estimated  values  were  recorded  in  tables  2  and  3. 


Technique 

a 

HBi 

a 

McCulloch 

1.898 

27856 

Ma 

1.985 

- 

Tsihrintzis 

NaN 

I^Bi 

-0.008 

- 

Table  2.  Results  for  JPEG  video  data. 


Technique 

c 

a 

■■.111 

o 

d 

41924 

1577 

Ma 

■lEMiil 

- 

0.878 

- 

- 

0.000 

-0.033 

Kogon 

- 

Table  3.  Results  for  Ethernet  activity  data. 


JPEG  results 

The  a  was  close  to  2  for  every  technique  (except  Tsihrintzis) 
which  suggests  that  the  /?  are  very  unstable.  This  explains 
the  large  (and  impossible)  value  for  P  from  Kogon’s  estim¬ 
ator. 

The  NaN  (Not  a  Number)  results  for  the  Tsihrintizis 
method  result  from  the  fact  that  the  method  must  take  the 
square-root  of  a  value  which  in  this  case  was  negative.  Since 
no  similar  result  occurred  when  testing  with  stable  random 
variables  this  may  suggest  the  data  in  question  does  not 
approximate  to  a  stable  distribution. 

In  order  to  determine  the  performance  of  the  estimation 
techniques  we  calculated  the  approximate  pdf  of  the  stable 
distribution  with  the  estimates  found  using  McCulloch’s, 
Ma’s  and  Kogon’s  techniques.  The  results  are  compared 
with  the  pdf  of  the  original  data  in  figure  9. 

Ethernet  results 

In  this  case,  even  though  the  distribution  was  again  sym¬ 
metric  in  appearance  (see  figure  10)  the  a  were  significantly 
less  than  2  in  all  cases.  This  suggests  that  the  distribution  is 
more  impulsive  than  for  the  JPEG  case.  It  also  means  that 
4  are  more  stable  and  have  more  influence  over  the  shape 
of  the  distribution.  An  interesting  point  is  that  it  is  the  Ma 
technique’s  estimate  of  a  that  achieves  the  best  match,  even 
though  Kogon’s  technique  estimates  \P\  >  0. 

6  Conclusions 

In  this  paper  we  have  directly  compared  four  quick  stable 
parameter  estimation  techniques,  with  both  stable  random 
variables  and  two  forms  of  real  data.  The  results  contain  a 
wealth  of  information  but  we  have  only  space  to  make  a  few 
salient  observations. 

These  experiments  were  carried  out  to  learn  more  about 
how  we  should  apply  these  techniques  to  real  data  estima¬ 
tion.  In  that  respect  we  suggest  that:- 
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1 .  Scaling  is  critical  for  Kogon’s  technique. 

2.  Tsihrintzis  and  Ma’s  techniques  can  be  applied,  even 
if  it  appears  that  \(5\  >  0. 

3 .  More  than  one  technique  should  be  applied  until  more 
about  the  performance  of  these  techniques  is  under¬ 
stood. 

Tables  2  and  3  record  our  estimates  for  two  real  data  sets  and 
the  difference  between  the  results  for  the  different  techniques 
are  marked  in  some  cases.  Until  we  learn  more  about  why 
this  is  the  case  we  should  be  careful  when  applying  these 
techniques  to  real  data. 

Appendix  A 

C  code  for  the  generation  and  estimation  techniques  dis¬ 
cussed  in  this  paper  is  available  from  the  authors  on  request. 
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Figure  2.  Results  fora  and  ^  when  /?=—!. 


Figure  3.  Results  for  a  and  $  when  P  =  —0.5. 


393 


O- 

ra 

B 

(0 

i 


2.5 
2 

1.5 
1 

0.5 

0 

-0.5 


5  ^ 

lloch, 

ogon, 

ulloct 

<ogor 

alpha'  i 

alpha 
,  beta 
,  beta  I 

- ! - j - 

♦  * 

— 

fi 

+ 

J 

+ 

- -B  -e 

B 

f . ^ 

I _ I 

e  ' 

I  ^  *  S  1 

1 _ 1 - 1 

t  * 

»  «  H  «  «  S  1 

}  fl  S  1  f 

m 

.1 

0  0.2  0.4  0.6  0.8  1  1.2  1.4  1.6  1.8 

Acutal  alpha 


Figure  4.  Results  for  a  and  ^  when  (3  =  0.5. 
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Figure  5.  Results  for  a  and  $  when  /?  =  L 
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Figure  8.  Results  for  c  when  k=10. 
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JPEG  data. 


Figure  6.  Results  for  a  when  the  FLOM  based  tech¬ 
niques  are  applied  to  a  skewed  distribution. 
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Abstract 

In  the  paper,  a  parametric  Fourier  series  based 
model  (FSBM)  for  or  as  an  approximation  to  an  arbi¬ 
trary  nonminimum-phase  linear  time-invariant  (LTI) 
system  is  proposed  for  statistical  signal  processing  ap¬ 
plications  where  a  model  for  LTI  systems  is  needed. 
Based  on  the  FSBM,  a  (minimum-phase)  linear  pre¬ 
diction  error  (LPE)  filter  for  amplitude  estimation  of 
the  unknown  LTI  system  together  with  the  Cramer  Rao 
(CR)  bounds  is  presented.  Then  an  iterative  algorithm 
for  obtaining  the  optimum  mean-square  LPE  filter  with 
finite  data  is  presented  which  is  also  an  approximate 
maximum  likelihood  algorithm  when  data  are  Gaus¬ 
sian.  Then  three  iterative  algorithms  using  higher- 
order  statistics  with  finite  non-Gaussian  data  are  pre¬ 
sented  for  estimating  parameters  of  the  FSBM  followed 
by  some  simulation  results  to  support  the  efficacy  of 
the  proposed  algorithms.  Finally,  we  draw  some  con¬ 
clusions. 


1.  Introduction 

In  many  statistical  signal  processing  areas  such 
as  signal  modeling,  power  spectral  and  polyspectral 
estimation,  system  identification,  deconvolution  and 
equalization,  a  widely  known  problem  is  the  identi¬ 
fication  and  estimation  of  an  unknown  linear  time- 
invariant  (LTI)  system  h{n)  driven  by  an  unknown 
random  signal  u(n)  with  only  a  given  set  of  output 
measurements  x(n) 

OO 

x(n)  =  u(n)  *  h(n)  =  ^  u{k)h{n  -  k)  (1) 

A:=— OO 

•This  work  is  supported  by  the  National  Science  Council  un¬ 
der  Grant  NSC  86-2213-E-007-037. 


The  system  function  H  (z)  is  often  modeled  as  a  para¬ 
metric  rational  function  such  as  autoregressive  (AR) 
model,  moving  average  (MA)  model  and  autoregressive 
moving  average  (ARMA)  model,  and  therefore  find¬ 
ing  a  rational  model  approximation  to  the  system  from 
data  becomes  a  parameter  estimation  problem. 

Recently,  Chien,  Yang  and  Chi  [1]  proposed  a  para¬ 
metric  cumulant  based  method  for  estimating  the 
phase  of  the  unknown  system  h(n)  through  allpass 
filtering  of  measurements  x(n)  when  x(n)  is  non- 
Gaussian.  Their  method  is  applicable  for  both  1-D  and 
2-D  systems.  They  used  a  Fourier  series  based  model 
(FSBM)  for  an  allpass  filter  which  leads  to  a  consistent 
estimate  for  the  system  phase  by  maximizing  a  single 
absolute  higher-order  cumulant  of  the  allpass  filter  out¬ 
put. 

In  this  paper,  an  FSBM  for  or  as  an  approxima¬ 
tion  to  an  arbitrary  nonminimum-phase  LTI  system  is 
proposed  for  applications  in  the  aforementioned  statis¬ 
tical  signal  processing  areas.  A  linear  prediction  error 
(LPE)  filter  based  on  the  FSBM  for  amplitude  estima¬ 
tion  of  the  system  is  proposed,  and  then  estimation  of 
FSBM  (amplitude  and  phase)  parameters  is  presented 
followed  by  some  simulations  results  and  conclusions. 

2.  Nonminimum-phase  FSBM 

Assume  that  h(n)  is  a  real  nonminimum-phase  LTI 
system  with  the  frequency  response  ff(ui)  =  H(z  = 
exp{ja;})  =  defined  as 

H (u)  =  exp  I  ^  ai  cos(ia;)  +jY^Si  sin(iw)  1  (2) 

I  i=l  i=l  J 

Two  advantages  of  the  pth-order  nonminimum-phase 
FSBM  defined  by  (2)  over  the  rational  model  (i.e.,  AR, 
MA  and  ARMA  models)  for  stable  LTI  systems  are 
discussed  as  follows. 
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The  FSBM  given  by  (2)  is  always  a  stable  HR  system 
no  matter  whether  it  is  causal  or  noncausal.  Therefore, 
in  practical  applications  where  the  system  to  be  de¬ 
signed  is  a  noncausal  stable  system  such  as  noncausal 
inverse  filter  l/if(w)  (when  h{n)  is  not  minimum- 
phase)  in  blind  deconvolution  and  channel  equaliza¬ 
tion,  the  FSBM  given  by  (2)  is  more  suitable  than  the 
ARMA  model  because  the  stability  issue  is  never  exis¬ 
tent  for  the  former,  thus  leading  to  more  efficient  design 
and  processing  procedure.  _ 

The  complex  cepstrum  h{n)  (inverse  Fourier  trans¬ 
form  of  ln[ir(w)]),  which  has  been  used  in  speech  de- 
convolution,  source  separation  of  speech  signals  and 
seismic  deconvolution,  associated  with  the  FSBM  given 
by  (2)  can  be  easily  shown  to  be 

h{n)  =  I  Z)i=i  +  *)  +  S{n  -  *)} 

without  need  of  finding  poles  and  zeros  of  the  system 
[2]  as  the  ARMA  model  requires. 

Next,  let  us  present  minimum-phase  FSBM  and  all¬ 
pass  FSBM.  The  FSBM  for  H{lj)  given  by  (2)  can  also 
be  expressed  as 

H{u))  =  Hmp{<^)  •  Hap{(^)  (4) 


where  the  FSBM 

i?Mp(w)  =  exp  It  ai  cos(ia;)  —  j  %  ai  sin(ia;) 
^  i=l 


2=1 


(5) 

can  be  shown  to  be  a  causal  stable  minimum-phase 
system  with  li?Mp(‘*^)l  =  l■f^(^)l  hMp(O)  =  Ij  and 
JJap(w)  is  also  an  allpass  FSBM  given  by 


^fAp(w)  =  exp 


|j5Z7iSin(ia;)| 


(6) 


where 


=  ai  +  /3i 


(7) 


3.  FSBM  for  LPE  filters 

Let  us  briefly  review  the  conventional  LPE  filter  for 
ease  of  later  need  for  the  presentation  of  the  FSBM  for 
LPE  filters. 

A.  Conventional  LPE  filters 

Assume  that  x{Ti)  is  a  real  stationary  random  pro¬ 
cess  modeled  by  (1)  where  h{n)  is  a  stable  LTI  system 
driven  by  a  white  noise  u{ti)  with  zero  mean  and  vari¬ 
ance  cr^.  The  conventional  pth-order  LPE  filter  [3] 

2=1 


(a  causal  FIR  filter)  processes  x{n)  such  that  the  pre¬ 
diction  error 

p 

e(n)  =  x{n)  *  =  x{n)  -h  (9) 

*=1 

has  minimum  variance  or  average  power  E[e^{n)].  The 
optimum  LPE  filter  Ap(z)  is  minimum-phase  and  can 
be  obtained  by  the  following  orthogonality  principle: 

£;[e(n)x(n  -  A;)]  =  0,  k  =  l,2,..,,p  (10) 


leading  to  a  set  of  symmetric  Toeplitz  linear  equations 
of  Op  =  (ai,  ...,ap)^.  When  x{n)  is  an  AR(p)  Gaussian 

process,  for  any  unbiased  estimates  dp  and  cr^  with 
finite  data,  their  covariance  matrix  is  lower  bounded 
by  the  following  Cramer  Rao  (CR)  bounds  [3]: 


T»  —1 

nT 


0 

2<t2 


(11) 


where  N  is  the  total  number  of  data  and  R^x  —  E[xx^] 
in  which  x  =  {x{n),  ...,x{n  -p  +  1))^- 

B.  LPE  filters  with  FSBM 

Let  the  pth-order  LPE  filter  Vp{n)  be  a  causal  stable 
minimum-phase  HR  filter  with  Vp(0)  =  1  and 

Vp{uj)  =  exp  (t  ai  cos{iuj)  —  j  t  a  sin(iw)  I  (12) 
[i=l  i=l  1 

and  thus  the  prediction  error  is  given  by 

OO 

e(n)  =  x{n)  *  Vp{n)  =  x{n)  -b  ^^Up(A:)a:(n  —  k)  (13) 

k=l 


Note  that  we  have  used  the  same  notations  ai  for  pa¬ 
rameters  of  both  the  proposed  FSBM  LPE  filter  Vp{w) 
and  the  conventional  LPE  filter  Ap(z).  The  optimum 
LPE  filter  Vpiu)  is  described  in  the  following  theorem. 

Theorem  1.  Assume  that  H{(jj)  is  an  FSBM  given 
by  (2)  with  order  equal  to  q  instead  of  p,  and  e(n)  is 
the  prediction  error  given  by  (13)  with  the  LPE  filter 
order  p  >  Q-  Then  the  optimum  LPE  filter  Vp{u})  = 
1/Bmp(w)  with  mm{E[e^{n)]}  =  Elu^in)]  =  a^. 

The  optimum  prediction  error  e(n)  must  satisfy 
dE[e^in)]/dak  =  0  from  which  one  can  prove  the  fol¬ 
lowing  orthogonality  principle: 

E[e(n)e(n  —  fc)]  =  T’ee(^)  =  0,  fc  =  l,2,...,p  (14) 

However,  (14)  forms  a  set  of  nonlinear  equations  Op. 
Nevertheless,  e(n)  will  be  a  white  process  ^s  p  is  suffi¬ 
ciently  large  which  implies  that  Vp(uj)  —  Ap(w)  (iden¬ 
tical  whitening  filter)  for  p  =  oo. 
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Based  on  Theorem  1,  an  iterative  algorithm  is  used 
to  estimate  or  find  an  approximation  to  Hmp  with 
finite  data  x{0),  x{l),  x{N  -  1)  as  follows: 

Algorithm  1.  Amplitude  estimation 

(SI)  Set  pmax  (maximum  of  p),  parameter  L,  incre¬ 
ment  parameter  s  >  1,  convergence  parameter  ^ 
and  t  =  0  (iteration  number). 

*(S2)  Set  t  =  p  =  and  =  (ai,  ...,ap)^. 
Search  for  the  minimum  of  the  objective  function 

=  (15) 

n=L 

and  the  associated  optimum  Sp  by  a  gradient 
type  iterative  optimization  algorithm  (such  as  the 
well-known  Fletcher-Powell  algorithm)  with  the 
initial  condition  ap{0)  =  (aj_^,0^)^. 

(S3)  If  p  <  Pmax  ^^d  \J{€Lp)  ^  J{o>p—s)\/J{(lp—s)  ^  ^5 
go  to  (S2),  otherwise 

^Mp(c^)  =  ^/%{(^)  (or  Sj  =  -Si)  (16) 

^  =  J{ap).  (17) 

The  optimum  prediction  error  e(n)  =  x{n)  *  Vp{n) 
corresponds  to  amplitude  equalized  data,  and  the  gra¬ 
dient  of  J{ap)  with  respect  to  a*  needed  in  (S2)  can 
be  shown  to  be 

^^  =  2ree(fc)  =  ^^'e(n)e(n-fc)  (18) 

n=L 

When  H(cj)  is  a  pth-order  FSBM  and  x{n)  is  Gaus¬ 
sian,  it  can  be  shown  that  both  Sp  =  — Op  and  cr^ 
are  approximate  maximum-likelihood  estimates.  More¬ 
over,  the  CR  bounds  associated  with  any  unbiased  es¬ 
timates  Sp  and  are  given  by 


cumulant  (and  thus  x(n)  is  also  non-Gaussian),  three 
iterative  algorithms  are  to  be  presented  for  the  estima¬ 
tion  of  parameters  of  the  FSBM  ffiuj)  given  by  (2). 

The  first  two  algorithms  estimate  the  system  ampli¬ 
tude  using  Algorithm  1  and  system  phase  using  Chien, 
Yang  and  Chi’s  phase  estimation  algorithm  [1]  which 
maximizes  a  single  absolute  Mth-order  (>  3)  sample 
cumulant,  denoted  of  the  phase  equalized  (all¬ 

pass  filtered)  data 

(20) 


(21) 
Gap(u>) 
(22) 

Because  |Cm,j/|  is  a  highly  nonlinear  function  of  bi,  one 
can  use  gradient  type  iterative  algorithms  for  finding 
the  optimum  It  has  also  been  proven  in  [1]  that 

=  ^{yin  +  i)-yin-i)}  (23) 

which  is  needed  for  computing  the  gradient  of  \CM,y\ 
with  respect  to  6^.  Next,  let  us  present  Algorithms  2 
and  3,  respectively. 

Algorithm  2. 

(51)  Estimate  i?Mp(^)  and  cr^  using  Algorithm  1. 

(52)  Find  the  optimum  alipass  FSBM  Gap(^)  given 
by  (21)  using  a  gradient  type  iterative  algorithm 
such  that  \CM,y  \  is  maximum  where  y{n)  =  x{n)^ 
9A.p{n).  Then  obtain  the  estimate  Pi  =  -6^. 


y{n)  =  x{n)^gxp{n) 

where  9A.p{n)  is  apth-order  allpass  FSBM 

Gap{u)  =  exp  Ij^bi  sin(ia;) 

I  i=:l 

It  has  been  shown  in  [1]  that  the  optimum 
turns  out  to  be  a  phase  equalizer,  i.e., 

arg{GAp(cj)}  =  -  arg{if(a;)}  ujr 


I  0 

0^  2(7^ 


(19) 


where  I  is  a  p  x  p  identity  matrix.  Note  that  the  CR 
bounds  associated  with  AR  parameters  (see  (11))  de¬ 
pend  on  correlations  of  a;(n),  while  those  associated 
with  ai  are  uniform  and  independent  of  correlations  of 
x{n).  The  CR  bound  associated  with  is  the  same 
for  both  FSBM  and  AR  model. 


4.  Estimation  of  FSBM  parameters 

In  this  section,  further  with  the  assumption  that 
u{n)  is  non-Gaussian  with  nonzero  Mth-order  (>  3) 


Algorithm  3. 

(51)  Estimate  jEfMp(^*^)  and  and  obtain  the  opti¬ 
mum  prediction  error  e(n)  ~  u{n)  *  hAp{n)  using 
Algorithm  1. 

(52)  Find  the  optimum  allpass  FSBM  Gap(^)  given 
by  (21)  using  a  gradient  type  iterative  algorithm 
such  that  \CM,y  \  is  maximum  where  y{n)  =  e(n)* 
9Ap{n),  Then  obtain  the  estimate  %  =  -bi. 

The  last  algorithm  (Algorithm  4)  estimates  the  sys¬ 
tem  amplitude  and  phase  simultaneously  using  inverse 
filter  criteria  [4-7].  Chi  and  Wu  [4]  proposed  a  family  of 
inverse  filter  criteria  which  includes  Tugnait’s  criteria 
[5],  Wiggins’  criterion  [6]  and  Shalvi  and  Weinstein’s 
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criterion  [7]  as  special  cases.  The  inverse  filter  hmy{n) 
is  estimated  by  maximizing 


where  r  is  even  and  m  >  r,  and 

u{n)  =  x(n)  *  hii^y{n)  (25) 

It  has  been  shown  in  [4]  that  u{n)  =  bu{n  -  r)  when 
/i(n)  is  an  arbitrary  stable  LTI  system  where  6  is  a 
scale  factor  and  r  is  an  unknown  time  delay.  Next,  let 
us  present  Algorithm  4. 

Algorithm  4. 

(51)  Set  integer  r>2  (even)  and  integer  m  >  r.  Let 
i7iNv(w)  =  1/H{w)  as  given  by  (2)  with  Oj  and 
Pi  replaced  by  and  —Pi,  respectively. 

(52)  Find  the  optimum  ffiNv(w)  (i-e-,  and  Pi)  using 
a  gradient  type  iterative  algorithm  such  that  Jr,m 
is  maximum.  Then  cr^  is  estimated  as  the  sample 
variance  of  the  obtained  optimum  inverse  filter 
output  u(n). 

It  can  be  shown  that 

W  =-i{2(n  +  i)  +  2(n-i)}  (26) 

oai  2 

i  {u(n  +  i)-  u{n  -  i)}  (27) 

dPi  2 

which  are  needed  for  computing  the  gradient  of  Jr,m 
with  respect  to  Oi  and  Pi  in  (S2),  respectively. 

Notice  that  when  H{u))  is  not  minimum-phase,  the 
FSBM  (noncausal  stable  HR  system)  is  well  suited  for 
the  noncausal  inverse  filter  ifiNV  i^)  =  1/^ (<^)  ^  nieii- 
tioned  in  Section  2.  When  H^ui)  is  an  FSBM,  the  opti- 
jjimxi  u{n)  —  w(ti)  (without  scale  factor  and  time  delay 
between  u(n)  and  u(n)). 

5.  Simulation  results 

In  this  section,  let  us  show  two  sets  of  simulation 
results.  For  the  first  simulation,  the  driving  input  u(n) 
was  assumed  to  be  a  zero-mean  i.i.d.  Exponentially  dis¬ 
tributed  random  sequence.  An  FSBM  of  order  p  =  5 
was  used  for  the  system  H{w),  whose  amplitude  and 
phase  (solid  lines)  are  shown  in  Figures  la  and  lb,  re¬ 
spectively.  Thirty  independent  runs  were  performed 
using  Algorithm  3  with  Pmax  =  s  =  5,  L  =  0  and 
M  =  3.  Mean  (dashed  line)  and  meanistandard  de¬ 
viation  (dotted  lines)  of  the  obtained  thirty  amplitude 
and  phase  estimates  for  N  =  1024  and  SNR  =  20  dB 


(white  Gaussian  noise)  are  also  shown  in  Figures  la  and 
lb,  respectively.  One  can  see  that  both  the  amplitude 
and  phase  estimates  are  unbiased  with  small  variance. 
Moreover,  the  results  obtained  by  Algorithms  2  and  4 
are  similar  to  those  shown  in  Figure  1. 

The  second  set  of  simulation  results  for  seismic  de- 
convolution  was  obtained  with  u(n)  assumed  to  be  a 
Bernoulli-Gaussian  sequence  and  the  system  (source 
wavelet)  h{n)  to  be  a  third-order  nonminimum-phase 
causal  ARM  A  system  (taken  from  [4])  instead  of  an 
FSBM.  Figure  2a  shows  the  synthetic  data  x{n)  for 
N  =  512  and  SNR  =  20  dB  (white  Gaussian  nohe). 
Figures  2b  and  2c  show  the  (noncausal)  estimate  h{n) 
(dotted  line)  and  the  deconvolved  signal  u{n)  (dotted 
line),  respectively,  obtained  using  Algorithm  4  with 
p  =  12,  r  =  2  and  m  =  4,  where  the  scale  factor  as 
well  as  the  time  delay  between  h{n)  and  h{n)  (solid 
line)  and  the  time  delay  between  u{n)  and  u(n)  (^olid 
line)  were  artificially  removed.  One  can  see  that  h(n) 
and  u(n)  are  good  approximations  to  h(n)  and  u(n), 
respectively.  The  above  simulation  results  support  the 
efficacy  of  the  proposed  algorithms. 

6.  Conclusions 

We  have  presented  an  FSBM  for  or  as  an  approxi¬ 
mation  to  an  arbitrary  nonminimum-phase  LTI  system 
for  applications  in  the  statistical  signal  processing  ar¬ 
eas  mentioned  in  the  introduction  section.  Based  on 
the  FSBM,  an  LPE  filter  for  amplitude  estimation  to¬ 
gether  with  the  CR  bounds,  an  algorithm  for  obtaining 
the  optimum  LPE  filter,  and  three  algorithms  for  es¬ 
timating  FSBM  parameters  were  presented.  All  the 
gradient  type  iterative  optimization  algorithms  used  in 
the  proposed  algorithms  have  a  computationally  effi¬ 
cient  parallel  structure  (FIR  filter  banks  with  only  two 
nonzero  coefficients  1/2  or  —1/2)  (see  (23),  (26)  and 
(27)).  However,  the  gradient  computation  associated 
with  Algorithm  1  (see  (18))  does  not  need  any  further 
processing  to  the  prediction  error  e(n).  Finally,  two 
sets  of  simulation  results  were  presented  to  support  the 
efficacy  of  the  proposed  algorithms. 
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Figure  1.  Simulation  results  for  N  =  1024  and 
SNR  =  20  dB  using  Algorithm  3.  Mean  (dashed 
lines)  and  mean±standard  deviation  (dotted  lines) 
of  thirty  (a)  amplitude  and  (b)  phase  estimates  to¬ 
gether  with  the  amplitude  and  phase  (solid  lines)  of 
the  system. 


Figure  2.  Simulation  results  for  seismic  deconvolu¬ 
tion  =  512  and  SNR  =  20  dB)  using  Algorithm 
4.  (a)  Synthetic  data  x{ny,  (b)  source  wavelet  h{n) 
(solid  line)  and  estimate  h{n)  (dotted  line);  and  (c) 
input  u{n)  (solid  line)  and  deconvolved  signal  u{n) 
(dotted  line). 
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Abstract 

Many  underwater  acoustic  signal  processing 
algorithms  are  designed  for  use  in  stationary  and/or 
Gaussian  noise.  While  these  assumptions  are  often  valid 
for  applications  in  deep  water  ocean  areas,  they  may  not 
be  appropriate  for  shallow  water  areas,  especially  in  the 
presence  of  local  shipping  activity.  Local  shipping  also 
produces  spatial  correlation  in  the  noise  and  introduces 
additional  complexity  for  multichannel  processing.  In 
this  paper,  two  30-minute  sets  of  ambient  ocean  noise, 
recorded  near  the  San  Diego,  California  coast,  are 
analyzed  for  stationarity  and  Gaussianity  using  the 
Kolmogorov-Smimov  test.  Since  processing  algorithms 
based  on  higher  order  statistics  often  assume 
Gaussianity,  time-dependent  fluctuations  in  the  third  and 
fourth  order  cumulants  are  also  analyzed.  The  analysis 
reveals  significant  variability  in  the  time  lengths  of 
stationary  periods,  and  episodic  periods  of 
nonGaussianity  that  last  for  up  to  five  minutes. 
Statistical  fluctuations  appear  predominantly  in  the 
second  and  fourth  order  cumulants  rather  than  the  third 
order  cumulant.  The  shipping  noise  is  also  shown  to  be 
correlated  between  pairs  of  hydrophones  with  the  level  of 
correlation  varying  over  time  and  the  correlation  ranging 
from  positive  to  negative  with  increasing  channel 
separation. 

1.  Introduction 

The  assumption  that  ambient  noise  is  Gaussian  has 
led  to  attempts  to  exploit  the  higher  order  statistics  of 
underwater  acoustic  signals  for  detection.  Algorithms 


using  higher  order  statistics  for  both  stationary  and 
transient  signal  detection  have  shown  good  performance 
in  simulations  with  Gaussian  noise.  However,  if  the 
noise  is  nonGaussian,  then  the  performance  of  these 
detectors  can  be  degraded.  Real  noise  dominated  by 
shipping  is  likely  to  have  time- varying  cumulants, 
leading  to  problems  with  algorithms  that  assunie 
stationarity.  The  individual  requirements  of  a  signal 
processing  algorithm  must  be  taken  into  consideration 
when  testing  for  stationarity  and  Gaussianity.  For 
example,  power  signal  detectors,  which  use  averaging  to 
achieve  noise  suppression,  perform  best  in  the  presence 
of  long  periods  of  stationarity,  while  transient  signal 
detectors  generally  require  short  periods  of  stationarity  or 
none  at  all. 

In  underwater  acoustics,  it  is  well  accepted  that 
shipping  noise  is  a  dominant  noise  source  at  frequencies 
between  20  and  500  Hz  and  that  there  are  time-dependent 
fluctuations  in  the  noise  [1].  The  time  scale  of 
fluctuations  depends  on  the  number,  density,  and 
operating  characteristics  of  ships,  the  proximity  of 
shipping  traffic  to  the  receiver,  the  receiver  location  in 
the  environment,  and  the  propagation  characteristics  of 
the  area.  The  complexity  of  shallow- water  environments 
in  conjunction  with  local  shipping  requires  careful 
consideration  before  making  assumptions  about  ambient 

noise.  ,  . 

Several  studies  of  the  higher  order  statistics  of  ocean 
noise  have  been  published.  In  Ref.  [2],  Brocket!  et  al. 
use  a  third  order  Gaussianity  test  to  show  that  ambient 
ocean  noise  is  Gaussian  for  periods  of  over  a  minute,  but 
nonGaussian  for  shorter  periods,  while  local  shipping 
noise  (due  to  one  nearby  ship)  is  nonGaussian  for  both 
the  longer  and  shorter  time  periods.  A  normalized  third 
order  statistic  is  used  in  Ref.  [3]  to  determine  that  a  45s 
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deep-water  noise  sequence  is  nonGaussian.  In  Ref.  [4], 
the  noise  generated  by  the  towing  platform  in  an 
experiment  is  shown  to  have  strong  bispectral 
components  and  is  thus  nonGaussian.  The  trispectrum 
is  used  in  Ref.  [5]  to  conclude  that  a  time  series 
generated  by  two  ships  passing  about  460m  and  635m 
from  a  sonobuoy  does  not  differ  significantly  from  a 
Gaussian  distribution.  These  studies  indicate  that  even 
in  deep  water,  ship  proximity  is  an  important  factor  in 
the  noise  field. 

In  this  work,  two  30-minute  sets  of  measured 
shallow  water  ambient  noise  dominated  primarily  by 
local  shipping  are  investigated  with  the  goal  of 
determining  whether  stationarity  and  Gaussianity 
assumptions  are  appropriate,  and  if  so,  for  what  time 
periods.  Third  and  fourth  order  cumulant  calculations  of 
the  noise  are  used  to  give  further  insight  into  the 
potential  effects  of  the  noise  on  algorithms  that  are  based 
on  higher  order  statistics.  Spatial  correlation  in  the  data 
is  also  investigated. 

The  noise  was  recorded  at  1500  samples/second  on  a 
vertical  array  placed  in  200m  water  near  the  port  of  San 
Diego,  California  [6].  The  stationarity  and  Gaussianity 
analyses  are  restricted  to  noise  recorded  on  two  of  the 
hydrophones,  or  channels:  channel  2  at  a  depth  of 
approximately  192  m,  and  channel  43  at  a  depth  of 
approximately  116  m.  The  correlation  analysis  includes 
data  from  channels  16  (170m)  and  31  (140m)  in  addition 
to  channels  2  and  43.  Ship  locations  during  the 
experiment  were  tracked  using  radar  and  updated 
approximately  every  two  minutes  [7].  In  both  noise 
sets,  several  ships  are  moving  close  to  Ae  array  [8]. 

2.  Time-dependent  cumulant 
fluctuations 


noise,  primarily  evident  in  the  second  and  fourth  order 
cumulants.  Increases  in  the  fourth  order  cumulant  also 
indicate  periods  of  nonGaussianity.  The  level  of 
shipping  activity  near  the  array  is  moderate  to  high 
during  times  that  the  noise  in  set  1  and  set  2  was 
recorded.  The  increased  variance  and  nonzero  kurtosis 
during  the  0-5  minute  and  20-30  minute  segments  of  the 
set  1  noise,  and  during  the  7-10  and  17-24  minute 
segments  of  the  set  2  noise  can  be  correlated  with  ship 
activity  in  a  2  km  radius  of  the  receiver  site.  The 
skewness  shows  minor  fluctuations  during  these  periods. 
The  two  periods  of  increased  variances  in  the  set  2  noise 
are  due  to  two  transient  noise  events,  the  first  having  a 
duration  of  approximately  2  minutes  and  the  second 
having  a  duration  of  3-6  minutes.  While  these  noise 


Time  (min) 

Fig.  1.  Time-varying  cumulants  for  the 
channel  2,  set  1  ambient  noise. 


A  1-second  sliding  window  is  used  to  calculate  the 
fluctuating  first  through  fourth  order  cumulants  of  the 
noise.  A  15-point  overlap  of  the  sliding  window 
corresponding  to  a  cumulant  sampling  rate  of  100 
samples/second  was  found  sufficient  to  avoid  aliasing 
distortions  in  the  time-variation  of  the  second,  third,  and 
fourth  order  cumulants.  The  30-minute  noise  sets  are 
divided  into  segments  of  approximately  1-minute 
duration  for  processing.  The  average  cumulants  for  the 
set  1  noise,  channels  2  and  43,  are  shown  in  Figs.  1  and 
2.  In  these  figures,  each  circle  represents  the  average 
cumulant  over  a  1 -minute  segment,  and  the  dashed  curves 
represent  the  sliding  window  minimum  and  maximum 
cumulant  values  within  the  1 -minute  segment.  The 
corresponding  curves  for  the  set  2  noise  are  shown  in 
Figs.  3  and  4. 

Visual  inspection  of  the  cumulants  over  the  two 
noise  sets  reveals  local  periods  of  nonstationarity  in  the 
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Fig.  2.  Time-varying  cumulants  for  the 
channel  43,  set  1  ambient  noise. 
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sources  are  of  unknown  origin,  they  are  likely  due  to  one 
of  two  ships  that  are  traveling  within  0.8  km  of  the  array 
site  during  these  times.  When  shipping  activity 
remained  outside  of  a  2  km  radius  of  the  receiver  site  for 
the  two  sets  of  noise,  little  variability  in  the  noise 
variance  and  kurtosis  was  found,  even  though  there  were 
sometimes  ships  traveling  within  5  km  of  the  receiver 
site. 


Fig.  4.  Time-varying  cumulants  for  the 
channei  43,  set  2  ambient  noise. 


3.  Stationarity  and  Gaussianity 

The  cumulant  curves  in  Figs.  1-4  indicate  that  the 
shipping  noise  contains  significant  periods  of 


nonstationarity.  A  two-sample  Kolmogorov-Smimov 
(K-S)  test  for  goodness  of  fit  is  used  to  quantify  the 
periods  of  stationarity/nonstationarity  in  the  noise.  To 
reduce  the  computational  burden,  the  first  27.3  minutes 
of  noise  from  the  30-minute  sets  are  separated  into  three 
9.1-minute  segments  for  analysis.  The  test  is  applied 
successively  over  the  data,  using  pairs  of  Is  windows 
with  no  overlap.  Breaks  in  stationarity  are  indicated  by 
significant  changes  in  the  cumulative  probability 
distribution  of  the  noise  (at  the  95%  confidence  level),  as 
determined  by  the  K-S  test.  The  test  reveals  that 
stationary  periods  in  the  noise  are  episodic  and  vary  in 
time  duration  from  the  minimum  test  resolution  of  1.0s 
to  9.1  minutes,  as  shown  in  Fig.  5  for  channels  2  and  43 
from  noise  sets  1  and  2.  Each  circle  in  Fig.  5  represents 
a  time  point  at  which  two  successive  probability 
distributions  are  determined  to  be  different  by  the  K-S 
test.  Conversely,  time  periods  where  circles  are  absent 
represent  periods  of  stationarity.  Trends  that  appear  in 
the  stationarity  test  results  generally  correspond  to  the 
fluctuating  cumulants  in  Figs.  1-4. 
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Fig.  5.  Stationarity  test  results  for  a  1- 
second  processing  window. 

A  discrepancy  between  the  cumulant  calculations  and 
the  stationarity  test  appears  during  the  set  1,  channel  43 
noise  during  the  first  3  minutes  of  the  noise  in  that  Ae 
noise  appears  stationarity  by  the  K-S  test  but  the  noise 
variance  is  declining.  This  can  be  accounted  for  by 
noting  that  the  stationarity  test  results  are  based  on  the 
rate  of  change  in  the  cumulants  within  each  successive  1- 
second  processing  window  and  is  thus  most  sensitive  to 
abrupt  changes,  while  the  cumulant  curves  in  Figs.  1-4 
show  the  average  cumulant  level  over  1  minute.  Note 
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that  there  are  periods  of  nonstationarity  even  when  the 
cumulants  indicate  that  the  noise  is  Gaussian,  due  to 
changes  in  the  noise  variance. 

To  quantify  the  periods  of  nonGaussianity,  a  one- 
sample  K-S  test  is  applied  to  the  two  30-minute  noise 
sets.  Each  processing  window  is  determined  to  be  either 
Gaussian  or  nonGaussian.  The  percent  of  1 -second 
processing  windows  that  are  nonGaussian  in  each  1- 
minute  noise  segment  is  calculated  and  shown  in  Fig.  6. 
Test  results  indicate  the  existence  of  periods  of 
nonGaussianity  in  the  noise  that  correlate  with  the 
results  of  the  cumulant  calculations  and  some  of  the 
periods  of  nonstationarity.  The  periods  of 
nonGaussianity  generally  last  for  several  minutes. 
However,  except  for  the  transient  noise  events  in  the  set 
2  noise,  and  the  0-5  minute  noise  segment  in  the  set  1 
noise,  the  noise  in  most  of  processing  windows  is 
Gaussian.  Note  that  this  figure  does  not  reveal  the 
levels  of  nonGaussianity,  as  these  can  be  observed  in 
Figs.  1-4,  and  are  discussed  in  detail  in  Ref.  [8]. 
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useful  measure  of  the  spatial  correlation  of  the  measured 
noise.  The  zero  lag  correlation  coefficient  between  two 
channels  of  data  is  given  by 
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with  t  =  kAt,  and  is  1.0  when  xi(t)=X2(t).  In  Fig.  7,  the 
time-dependent  correlation  coefficients  of  the  first  three 
minutes  of  the  set  1  measured  noise  are  shown,  for 
channel  2  correlated  with  each  of  channels  16,  31,  and 
43.  These  values  are  calculated  using  a  1 -second 
processing  window  and  50%  overlap.  For  independent 
Gaussian  noise  and  the  same  processing  parameters,  the 
average  absolute  value  of  C2  is  0.02  and  Ae  maximum  is 
approximately  0.09.  By  comparison,  the  shipping  noise 
is  significantly  correlated  between  the  pairs  of  channels 
for  this  3-minute  segment.  Channels  2  and  16  are 
positively  correlated  over  the  3-minute  time  duration, 
channels  2  and  31  are  positively  correlated  over  about 
half  the  duration  and  negatively  correlated,  or 
anticorrelated,  for  the  remaining  duration.  Channels  2 
and  43  are  negatively  correlated  for  nearly  the  entire  3- 
minute  segment.  These  results  show  that  the  degree  of 
interchannel  correlation  changes  from  positively 
correlated  to  negatively  correlated  as  the  depth  spacing 
between  channels  increases.  Independent  noise  is 
theoretically  zero  correlated.  A  positively  correlated 


Fig.  6.  Percent  of  nonGaussian 
processing  windows  in  each  one  minute 
segment  for  channels  2  and  43,  noise 
sets  1  and  2. 
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4.  Spatial  correlation 

The  ambient  shipping  noise  is  spatially  correlated, 
i.e.,  correlated  between  the  different  hydrophones.  This 
can  affect  signal  processing  algorithms,  whose 
performance  is  often  predicted  and/or  benchmarked  using 
independent  noise.  Since  the  energy  detector  and  some 
higher  order  statistics  detectors  make  use  of  only  the  zero 
lag  correlation  value,  the  correlation  coefficient  is  a 
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Time-dependent  correlation 
coefficients  from  the  set  1  noise  between 
channel  2  and  (a)  channel  16,  (b)  channel 
31,  and  (c)  channel  43  . 
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signal  can  be  detected  better  in  negatively  correlated  noise 
than  in  either  independent  or  positively  correlated  noise. 
Negative  correlation  could  result  from  an  odd  number  of 
surface  bounces  in  the  path  to  one  hydrophone  and  an 
even  number  to  the  other  for  dominant  noise  sources. 
This  suggests  that  one  should  detect  on  positive  and 
negative  signal  peaks  when  signal  and  noise  can  have 
opposite  correlation. 

To  examine  the  interchannel  correlation  of  the  noise 
over  the  entire  30-minute  segment,  the  channel  2  and 
channel  43  noise  is  divided  into  approximately  1-minute 
durations  for  processing  (as  done  for  the  cumulant 
calculations  shown  above).  The  mean,  minimum,  and 
maximum  C2  values  in  each  1-minute  duration  are  shown 
in  Fig.  8  for  noise  sets  1  and  2.  These  two  channels  are 
negatively  correlated  for  most  of  the  30-minute  duration 
in  both  sets  1  and  2.  The  maximum  and  minimum 
correlation  coefficients  over  the  30-minute  duration  are 
0.20  and  -0.32  for  set  1,  and  0.86  and  -0.75  for  set  2. 

For  the  set  1  noise,  the  degree  of  negative  correlation 
is  larger  during  the  0-7  minute  period  when  the  channel  2 
noise  is  nonGaussian  and  the  channel  43  noise  is 
Gaussian.  For  the  set  2  noise,  strong  positive  and 
negative  C2  values  appear  during  the  two  transient  noise 
events  during  the  7-10  and  17-24  minute  segments. 


variability,  stationarity,  nonGaussianity,  and  spatial 
correlation.  Calculations  of  second  through  fourth  order 
cumulants  for  two  channels  of  noise  recorded  on  a 
vertical  array  indicate  that  there  are  significant  changes  in 
both  variance  and  kurtosis,  in  some  instances  due  to  loud 
transient  noise  events  lasting  2-6  minutes,  which  are 
probably  generated  by  ships  moving  within  1  km  of  the 
array.  A  two-sample  K-S  test  is  used  quantify  the 
periods  of  stationarity,  which  last  from  the  minimum 
test  resolution  of  1-second  to  over  9.1  minutes.  A  one- 
sample  K-S  test  reveals  periods  of  nonGaussianity  that 
are  consistent  with  the  time-varying  levels  of  kurtosis 
and  are  episodic,  lasting  for  periods  of  up  to  several 
minutes.  Except  for  these  periods,  the  noise  is 
Gaussian,  even  though  it  may  be  nonstationary.  Second 
and  fourth  order  correlation  coefficients  of  the  noise 
reveal  that  the  noise  is  spatially  correlated  between  pairs 
of  hydrophones  with  the  correlation  ranging  from 
positive  to  negative  as  the  sensor  spacing  increases. 
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Fig.  8.  Time-varying  correlation 
coefficients  between  channels  2  and 
channel  43  from  (a)  the  set  1  noise,  and 
(b)  the  set  2  noise. 
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Abstract 

A  comparison  between  second  and  fourth  order 
moment  detectors  is  made  for  three  low  frequency 
transient  signals  embedded  in  measured  ambient  shipping 
noise  from  a  shallow-water  area.  Detector  evaluations 
must  be  made  over  short  periods  of  time  because  of  the 
nonstationary  nature  of  the  noise.  Results  indicate  that 
the  fourth  order  moment  detects  better  than  the  second 
order  moment  for  the  two  nonGaussian  transient  signals, 
but  not  as  well  for  the  more  Gaussian  transient  signal. 
This  suggests  that  the  fourth  order  moment  detector  be 
used  in  addition  to  the  second  order  moment  detector, 
rather  than  as  a  replacement.  The  results  of  detection 
simulations  with  independent  Gaussian  noise  are  given 
for  comparison  to  the  results  with  the  measured  shipping 
noise.  The  comparison  indicates  that  gains  for  the 
fourth  order  moment  detector  over  the  second  order 
moment  detector  are  higher  in  the  shipping  noise  than  in 
the  Gaussian  noise  for  the  two  nonGaussian  signals. 

1.  Introduction 

Since  sonar  targets  have  recently  become  quieter  and 
their  signatures  more  difficult  to  detect,  especially  in 
coastal  and  shallow- water  areas,  attempts  are  ^ing  made 
to  exploit  target-generated  transient  signals  to  improve 
detection.  One  approach  is  to  capitalize  on  the 
nonGaussian  nature  that  often  characterizes  transient 
signals  through  the  use  of  higher  order  statistics.  The 
specific  detection  problem  addressed  in  this  paper  is  that 
of  passively  detecting  a  single  occurrence  of  a 
deterministic  transient,  or  energy,  signal  in  nonstationary 
noise[l]-[6].  The  detectors  do  not  assume  noise  or  signal 
stationarity.  The  signal  is  assumed  to  be  received  on 
either  one  or  two  hydrophones,  or  channels,  with 
channels  repeated  in  the  higher  order  moments  as  needed. 
While  the  ambient  shipping  noise  is  spatially  correlated, 


it  is  assumed  that  the  signal  to  be  detected  is  more 
correlated  than  the  noise. 

2*  Moments  and  detection  methodology 

The  p-th  order  moment  detector  uses  the  zero  lag  of  the 
p-th  order  correlation  as  a  test  statistic  and  compares  it  to 
a  predetermined  threshold.  A  decision  is  then  made  as  to 
whether  the  test  data  contains  a  signal  embedded  in  noise 
or  only  noise.  The  p-th  order  moment  of  a  single 
channel  of  input  data,  x(t),  is 

N-l 

k=0 

with  t  =  kAt.  For  two  channels  of  input  data,  the 
second  order  moment  is 

N-l 

k=0 

and  the  fourth  order  moment  is  defined  by 

N-l 

k=0 

The  latter  could  also  be  defined  by  squaring  X|  (t)  and 

X2(t)  to  form  mj,  but  this  generally  leads  to  poorer 
detection  results  when  the  noise  is  spatially  correlated 
[7].  The  p-th  order  moment  is  equal  to  the  area,  volume, 
or  hypervolume  beneath  the  corresponding  moment 
spectrum,  i.e., 

Yl  V2  V2 

m|=  5^ES=  2)x(f)X*(f)Af=  TlX(ffAf, 
j=-N/,  j=.N^ 

and 
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ni^  — 


^2  Vl  Vl 

1  1 

].=-%  h^-Vl  h-V2 


^2  V2  V2 

=  E  E  E 

u=-V2h--V2h--V2 


■X(f,)X(f2)X(f3) 

•X*(fi+f2+f3)Af,Af2Af3 


for  a  single  channel  of  input  data,  where  ES  is  the  energy 
spectrum,  TS  is  the  trispectrum,  and  X(f)  is  the  Fourier 
transform  of  x(t)  with  f  =  jAf.  In  this  sense,  the 
trispectrum  is  a  higher-order  extension  of  the  energy. 

Since  transient  signals  have  finite  energy  and  moments 
that  vary  over  time,  averaging  to  achieve  noise 
suppression  must  be  used  cautiously,  and  is  generally 
possible  only  if  the  transient  signal  is  of  long  duration 
and/or  the  sampling  rate  of  the  sonar  system  is  high.  To 
avoid  these  assumptions  and  the  assumption  of  noise 
stationarity,  the  model  used  here  assumes  no  averaging. 

An  additional  processing  step  may  be  added  to  the 
detection  algorithm.  Previous  simulations  of  moment 
detector  performance  in  Gaussian  noise  have  shown  that 
the  performance  of  the  fourth  order  moment  detector  can 
improve  relative  to  the  second  order  moment  detector 
through  the  use  of  simple  passband  filtering  [2],  [3]. 
That  is,  one-dimensional  passband  filtering  of  the 
received  channel  data  prior  to  applying  the  detector  has  a 
nonlinear  and  advantageous  effect  in  higher  order 
moments.  This  process  is  referred  to  as  prefiltering.  In 
passive  detection,  it  is  unlikely  that  a  good  estimate  of 
the  signal  passband  will  be  available  unless  one  is 
searching  for  a  specific  class  of  signals.  However, 
prefiltering  can  be  used  by  dividing  the  data  passband 
into  smaller  frequency  bands  and  applying  the  moment 
detectors  successively  in  the  smaller  bands.  Thus,  the 
use  of  prefiltering  in  its  simplest  form  can  determine 
whether  or  not  it  has  merit  for  passive  sonar  applications 
and  whether  it  should  be  pursued  in  more  practical 
applications. 

3.  Test  data 


The  ambient  shipping  noise  used  for  evaluation  of  the 
moment  detectors  was  recorded  during  the  Shallow  Water 
Evaluation  Cells  Exercise-3  (SWellEX-3)  which  occurred 
during  July- August  1994  near  the  port  of  San  Diego, 
California  [8].  Ambient  noise  measurements  were 
recorded  on  a  vertical  array  of  hydrophones,  located  in 
water  approximately  200  m  deep.  The  area  is 
characterized  by  heavy  shipping  traffic,  including 
military,  commercial,  and  recreational  vessels.  The 
noise  is  sampled  at  1500  samples/second.  Over  29 
minutes  of  ambient  shipping  noise,  received  at  two 


channels,  is  used  for  the  detection  analysis.  The  deeper 
channel  is  located  at  depth  192  m  and  the  shallower 
channel  is  located  at  depth  116  m. 

Three  low  frequency  synthetic  test  signals  are  used  in 
simulations  with  the  measured  noise.  The  first  signal  is 
a  simulated  whale  cry  and  is  almost  Gaussian  with  a 
kurtosis  of  -0.73.  This  signal  is  not  expected  to  be 
easily  detectable  with  the  higher  order  moments.  The 
second  is  an  exponentially  damped  sinusoid  with  a 
moderately  large  kurtosis  of  5.89.  The  third  signal  is  a 
narrow-time  pulse.  This  signal  is  broadband  and 
significantly  nonGaussian  (kurtosis  =  262.65).  The 
three  test  signals  and  their  energy  spectra  are  shown  in 
Figs.  1  and  2.  Simulations  are  performed  with 
processing  windows  equal  to  the  signal  durations  of  one 
second. 

Passbands  for  prefiltering  are  chosen  to  be  10-25  Hz 
for  the  whale,  0-40  Hz  for  the  sinusoid,  and  0-256  Hz  for 
the  pulse.  A  priori  knowledge  of  the  signal  passband  is 
generally  not  available.  However,  these  detectors  can  be 
applied  successively  in  sequential  frequency  bands.  Use 
of  such  an  algorithm  will  likely  give  somewhat  different 
results  than  the  broadband  application  tested  here  since 
the  moments  of  the  partial  signal  in  each  frequency  band 
determine  detection,  rather  than  the  moments  of  the 
entire  signal.  The  effects  of  this  kind  of  processing  are 
currently  being  investigated. 


Whale 


Time  (sec) 


Fig.  1.  Three  test  signals.  The  whale  is 
almost  Gaussian,  the  sinusoid  is 
moderately  nonGaussian,  and  the  pulse  is 
strongly  nonGaussian. 
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Fig,  2.  Energy  spectra  of  test  signals. 


4.  Simulations  and  results 

Since  the  measured  noise  is  nonstationary,  statistical 
measures  of  performance  are  not  derived  from 
simulations.  Instead,  an  average  measure  of  performance 
over  short  periods  of  time  is  given.  The  29  minutes  of 
ambient  noise  are  separated  into  segments  lasting  1.82 
minutes,  and  each  segment  is  divided  into  109  processing 
windows.  For  a  range  of  signal-to-noise  ratios  (SNRs), 
the  signal  is  added  to  the  noise  in  each  processing 
window,  one  of  the  detectors  is  applied,  and  a  resulting 
curve  of  probability  of  detection  (P^)  versus  SNR  is 
obtained  for  probability  of  false  alarm  (Pfa)  equal  to 
0.009.  The  P^  versus  SNR  curve  is  interpolated  at 
P(i=0.5  to  give  a  measure  of  performance  that  consists  of 
a  single  number.  The  SNR  gain  is  defined  as  the 
difference  between  these  two  numbers  for  the  second  and 
fourth  order  moment  detectors,  with  a  positive  SNR  gain 
indicating  that  the  fourth  order  moment  detector  performs 
better  than  the  second  order  moment  detector. 

The  results  of  these  simulations  using  one  and  two 
channels  of  input  data  and  the  three  test  signals  are 
shown  in  Figs.  3-6.  The  single  channel  case  uses  noise 
received  at  the  shallower  hydrophone.  Each  point  in 
these  curves  represents  the  SNR  gain  over  a  1.82  minute 
time  period.  The  average  SNR  gains  over  the  entire  29 
minute  time  periods  are  given  in  Tables  1  and  2. 

For  comparison,  detection  results  using  simulated 
independent  Gaussian  noise  are  included  in  Tables  3  and 
4.  Since  this  noise  is  stationary,  1000  realizations  of 


noise  per  channel  are  used  to  calculate  the  SNR  gain  in  a 
1.82  minute  segment. 

Comparison  of  the  second  and  fourth  order  moment 
detectors  indicates  that  the  fourth  order  moment  can 
detect  transient  signals  better  than  the  second  order 
moment,  or  energy,  detector  for  the  two  nonGaussian 
signals  when  either  one  or  two  channels  of  input  are  used 
in  the  moment  definitions,  and  no  prefiltering  is  used 
(see  Figs.  3  and  4).  In  some  cases,  about  1  dB  of  gain  is 
achieved  by  using  two  channels  of  input  for  detection 
over  using  a  single  channel.  Variations  in  the  SNR 
gains  over  time  are  generally  as  high  as  3  dB. 

The  fourth  order  moment  detector  performs  better  than 
the  second  order  moment  detector  for  the  sinusoid  signal 
for  the  majority  of  the  1.82  minute  time  periods.  The 
average  SNR  gains  over  the  29  minutes  are  0.73  dB  and 
0.66  dB  for  the  one  and  two  channel  cases.  For  the 
highly  nonGaussian  pulse,  the  fourth  order  moment 
always  detects  best,  with  SNR  gains  of  over  4  dB  and 
29-minute  average  SNR  gains  of  5.94  dB  and  6.02  dB  for 
the  one  and  two  channel  cases.  The  fourth  order  moment 
detector  does  not  usually  show  improvement  over  the 
second  order  moment  for  the  whale  signal.  The  average 
SNR  gains  here  are  -0.24  dB  and  -0.27  dB  for  the  one 
and  two  channel  cases. 

When  prefiltering  is  used,  the  SNR  gains  increase  for 
the  sinusoid  signal  (compare  Figs.  3  and  5).  The  29 
minute  average  SNR  gains  for  the  one  and  two  channel 
cases  approximately  double.  Average  SNR  gains  change 
less  than  5%  when  prefiltering  is  added  to  the  detection 
processing  for  the  pulse  due  to  the  broadband  and  low 
frequency  nature  of  the  signal.  The  higher  noise 
frequencies  that  are  filtered  (257-750  Hz)  contain  much 
less  energy  that  the  lower  frequencies  for  the  shipping 
noise.  Prefiltering  causes  the  fourth  order  moment 
detector  to  perform  poorer  for  the  whale  signal.  This  is 
not  the  case  with  Gaussian  noise,  where  prefiltering 
increases  the  SNR  gain  of  the  fourth  order  detector. 

Additional  differences  occur  when  comparing  Gaussian 
noise  to  the  measured  ambient  shipping  noise.  Without 
prefiltering,  the  SNR  pins  for  the  pulse  are  5.94  dB  and 
6.02  dB  for  the  shipping  noise  with  one  and  two 
channels  of  input,  but  are  only  3.47  dB  and  3.53  dB  for 
the  Gaussian  noise.  Higher  gains  also  occur  for  the 
shipping  noise  and  the  pulse  when  prefiltering  is  used, 
and  for  the  sinusoid  with  and  without  prefiltering. 

Decreases  in  the  average  SNR  gains  with  prefiltering 
occur  only  in  the  measured  noise  cases.  Since  the 
average  Gaussian  noise  is  uniformly  distributed  across 
frequencies,  prefiltering  advantages  in  the  fourth  order 
moment  over  the  second  order  moment  are  always  greater 
[2],  [3].  However,  in  the  nonstationary  measured 
shipping  noise,  which  is  not  uniformly  distributed  across 
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frequencies,  the  fourth  order  advantage  compared  to 
second  order  is  somewhat  unpredictable. 


No  Prefiltering 

Prefiltering 

Whale 

-0.24 

-0.57 

Sinusoid 

0.73 

1.48 

Pulse 

5.94 

6.11 

Table  1.  Average  SNR  gains  (in  dB)  for 
the  tricorrelation  detector  over  the  cross 
correlation  detector  in  the  shipping  noise 


with  one  channel  of  input. 


No  Prefiltering 

Prefiltering 

Whale 

-0.27 

-0.98 

Sinusoid 

0.66 

1.16 

Pulse 

6.02 

5.73 

laDII?  fc-  - / 

the  tricorrelation  detector  over  the  cross 
correlation  detector  In  the  shipping  noise 
with  two  channels  of  input. 


No  Prefiltering 

Prefiltering 

Whale 

-0.44 

-0.24 

Sinusoid 

-0.23 

0.45 

Pulse 

3.47 

4.40 

■  nx  M _ 

I  aoie  o.  oi^n  yaiiio  \iii  — 

tricorrelation  detector  over  the  cross 
correlation  detector  in  independent 
Gaussian  noise  with  one  channel  of  input. 


No  Prefiltering 

Prefiltering 

Whale 

-1.10 

-0.45 

Sinusoid 

-1.29 

0.84 

Pulse 

3.53 

5.14 

.anx  A _ 

I  aDie  •»-  oMn  - 

tricorrelation  detector  over  the  cross 
correlation  detector  in  independent 
Gaussian  noise  with  two  channels  of 
input. 


5.  Conclusions 

Simulations  with  three  low  frequency  test  signals 
embedded  in  both  independent  Gaussian  noise  and 
measured  ambient  shipping  noise  from  a  shallow-water 
area  indicate  that  the  fourth  order  moment  can  provide 
improved  detection  capability  over  the  second  order 
moment,  or  energy,  detector  if  the  transient  signal  is 
sufficiently  nonGaussian.  Since  the  measured  noise  is 
nonstationary,  measures  of  relative  detector  performance 
are  calculated  over  short  periods  of  time.  Results 


indicate  that  the  SNR  gains  for  the  fourth  order  moment 
detector  over  the  second  order  are  higher  on  average  in  the 
shipping  noise  than  they  are  in  Gaussian  noise  for  the 
nonGaussian  transient  signals.  Since  the  fourth  order 
moment  detector  does  not  perform  as  well  as  the  second 
order  for  the  transient  signal  that  is  nearly  Gaussian,  it 
may  be  best  used  in  addition  to  the  second  order  moment 
detector,  rather  than  a  replacement. 
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Fig.  3.  SNR  gains  for  the  tricorreiation 
detector  over  the  cross  correlation 
detector  using  one  channel  of  input  data 
and  no  prefiltering. 
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Fig.  4.  SNR  gains  for  the  tricorreiation 
detector  over  the  cross  correlation 
detector  using  two  channels  of  input  data 
and  no  prefiltering. 


Fig.  5.  SNR  gains  for  the  tricorreiation 
detector  over  the  cross  correlation 
detector  using  one  channel  of  input  data 
and  prefiltering. 
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Fig.  6.  SNR  gains  for  the  tricorreiation 
detector  over  the  cross  correlation 
detector  using  two  channels  of  input  data 
and  prefiltering. 
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Abstract 

The  detection  of  two  Spectrally  Equivalent  (SE) 
processes  is  addressed.  The  two  SE  processes  are  mod¬ 
eled  using  two  SE  parametric  models:  the  noisy  AR 
model  and  the  ARMA  model  Higher-order  statistics 
are  shown  to  be  an  efficient  tool  for  the  SE  process  de¬ 
tection  problem.  A  new  detector  based  on  the  Higher- 
Order  Yule-Walker  matrix  singularity  is  studied.  The 
detector  performance  is  compared  in  supervised  and  un¬ 
supervised  learning.  The  model  order  mismatch  is  then 
studied. 


1  Introduction 

The  detection  of  Spectrally  Equivalent  (SE) 
processes  is  an  interesting  problem  in  many  signal 
processing  applications.  SE  processes  have  been  ob¬ 
served  in  spread  spectrum  communications,  including 
cellular  mobil  systems,  military  applications,  geodesic 
survey,  position  location  for  vehicles  [5].  The  SE  de¬ 
tection  problem  has  also  received  much  attention  in 
SAR  image  processing.  For  instance,  the  detection  of 
non-linearities  contained  in  SAR  images  using  Volterra 
models  can  reduce  to  compare  signals  having  the  same 
spectrum  [4].  The  detection  of  signals,  which  have 
very  close  spectra  over  a  large  bandwidth,  is  difficult 
to  achieve  using  techniques  based  on  second-order  sta¬ 
tistics. 

This  paper  proposes  to  model  SE  processes  by  two 
SE  parametric  models:  the  ARMA  model  and  the 
noisy  AR  model.  This  modeling  was  studied  success¬ 
fully  in  [8],  where  it  was  proved  that  SE  ARMA  and 
noisy  AR  processes  cannot  have  the  same  Higher-Order 
Statistics  (HOS).  This  property  induced  a  binary  hy¬ 
pothesis  test  based  on  higher-order  ’  cumulants.  Un¬ 


fortunately,  this  test  cannot  be  easily  generalized  to 
composite  hypotheses. 

This  paper  studies  new  SE  signal  detectors  based  on 
the  HOS  Yule-Walker  equations  (HOYWE).  Section  3 
shows  that  the  Higher-order  Yule-Walker  matrix  singu¬ 
larity  is  an  efficient  tool  for  the  SE  process  detection. 
A  Likelihood  Ratio  Detector  (LRD)  based  on  the  de¬ 
terminant  of  this  matrix  is  developed,  when  the  signal 
parameters  are  known.  However ,  the  signal  parameters 
are  unknown  in  most  practical  applications,  and  have 
to  be  estimated.  Section  4  generalizes  the  proposed 
detector  to  composite  hypotheses.  This  generalization 
is  very  simple,  since  the  test  statistic  does  not  depend 
on  the  model  parameters.  Section  5  studies  the  model 
order  mismatch  problem.  This  section  shows  that  an 
overestimated  model  order  does  not  deteriorate  the  de¬ 
tector  performance.  Simulation  results  and  conclusions 
are  reported  in  sections  6  and  7. 

2  Problem  Formulation 

The  study  is  restricted  to  stationary  and  ergodic 
processes,  with  symmetric  continuous  spectra,  which 
can  be  modeled  by  linear  processes.  The  modehng  of 
two  SE  processes  by  parametric  ARMA  and  noisy  AR 
processes  can  be  justified  by  the  following  property: 

If  S  is  a  symmetric  continuous  spectrum^  there 
exists  an  AR{p)  process,  an  MA{q)  process  and  an 
ARMA{p,  q)  process  whose  spectra  approximate  S  as 
close  as  desired  [1]. 

Consider  the  two  following  signal  models  : 

•  An  AR{p)  process  x{n)  driven  by  an  input  e(n) 
and  corrupted  by  an  additive  random  Gaussian  noise 
6(n): 

yo{n)  =  x{n)  -f-  h{n)  (1) 
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with: 


p 

aMn  -j)+  e{n)  (2) 

i=i 

e(n)  and  b{n)  are  assumed  to  be  i.i.d.  and  mutually 
independent. 

•  An  ARMA{p^p)  process  2/1  (n),  driven  by  a  non- 
Gaussian  i.i.d.  input  y(n),  with  the  same  mean  and 
power  spectral  density  as  defined  by 

p  p 

(”)  =  “  XI  (n  -  j)  +  X]  bjg{n  -  j )  (3) 

j=l  j—0 

The  spectral  density  of  yo{n)  is  a  special  ARMA{p,p) 
spectrum  [3].  The  spectral  equivalence  for  stable 
processes  yo{n)  and  yi(n)  implies  that  the  ARM  A  and 
noisy  AR  model  parameters  are  linked  by  a  one-to-one 
transformation. 

3  Cumulant-Based  Detector  with 

Known  Model  Par^lmeters  (“KP 

Detector”) 

The  SE  process  detection  problem  can  be  expressed 
as  a  binary  hypothesis  testing  problem: 

(  process  )  =  2/W  =  ^  =  0. •  ■  •  -  1 

The  Likelihood  Ratio  Test  (LRT)  for  hypotheses  Ho 
and  Hi  cannot  be  easily  derived,  for  non- Gaussian  AR 
and  ARMA  processes.  Consequently,  suboptimal  de¬ 
tectors  have  to  be  considered.  This  paper  proposes  to 
use  the  Higher-Order  Yule- Walker  Equations  for  the  SE 
process  detection.  The  HOYWE  for  an  AR{p)  process 
are  defined  by  [6]: 

p 

^ajC^{m-j,0,...,0)  =  0  ,'im>0  (5) 

i=o 

where  C^(ti,T2,  . . .  ,rfc_i)  denotes  the  AR  process 
kth-oidei  cmnulant  at  lag  t=  (ti,  r2, . . . , Tfc-i).  The 
property  C^°(r)  =  C^(t)  holds  VA:  >  2,V7:  G 
since  the  additive  noise  is  Gaussian  and  independent 
of  x(n).  Denote  Ap(^)  =  det(Rp(^))  the  determinant 
of  the  following  matrix 

r  ^p+1  • • •  ^1  1 


with  ^  zzz  . . .  ,^2p+i)  e  Denote 

C»  =  (Cf(l-p,0,...,0),...,C^‘>(l+p,0,...,0)f 

(7) 

The  concatenation  of  eq.’s  (5)  for  m  G  {1, . , .  ,p  +  1} 
leads  to 

Kp(C2).(l,ai, . . .  =  0  (8) 

hence 

A«^Ap(C«)=0  (9) 

On  the  other  hand,  for  an  ARMA{pjp)  process,  eq.^s 
(5)  hold  for  m  >  p^  but  not  for  m  G  {1, . . .  ,p}.  It  is 
then  legitimate  to  assume  that 

Aj^^A^(Ci)^0,  (10) 

where  is  defined  as  in  (7)  with  ARMA  process  cu- 
mulants.  The  SE  noisy  AR  and  ARMA  process  detec¬ 
tion  can  then  be  expressed  as  a  binary  simple  hypoth¬ 
esis  testing  problem: 

Ho  :  (Noisy  AR  process)  =  0  .  ^ 

Hi  ;  (ARMA  process)  Ajb  =  A^  ^  0  ^  ' 

Define  as  the  sample  cumulant  vector  obtained  by 
replacing  the  true  cumulants  in  (7)  by  their  usual  es¬ 
timates,  and  denote  A^  =  Ap(Cfc).  The  noisy  AR 
and  ARMA  process  cumulant  vector  estimate  is  as¬ 
ymptotically  an  imbiased  Gaussian  vector  such  that: 

NE[{Ck  -  Cl)(Cfc  -  C')^  I  Hi]  =  St  [6].  Ac- 
cording  to  ([1],  p.  211),  this  property  implies  that  the 
determinant  estimate  A^  is  asymptotically  an  unbiased 
Gaussian  variable  with: 

jvSoo  =  DfT.iDi  4  (12) 

In  (12),  D\  is  a  vector  whose  element  is  (j)  = 
(5A(|)  /  d^j)(C}.).  It  is  then  straightforward  to  prove 
that: 

(13) 

m=l 

where  |  denotes  the  number  of  in  Rp(C)  and 
{Cof(Rp{.)))m  the  Rp(.)  matrix  cofactor  computed  at 
the  m*'^  occurence  of  The  statistical  properties  of 

Afc  can  then  be  asymptotically  derived  under  both  hy¬ 
potheses: 


R^(0  = 


^P+2 

^2p+l 


(6)  Ho:  Noisy  AR  process  Ak  ~  N(0,ao{N)) 

Hi:  ARMA  process  Afc  ~  Ar(Aj,cri(Ar)) 

(14) 
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with  af{N)  =  jf(Tl  The  LRT  for  these  two  hypotheses 
can  then  be  expressed  as; 


Ho  rejected  if  Ti  = 


(Afc  — 

aliN) 


In  (15),  ki  is  a  threshold  depending  on  the  False 
Alarm  Probability  (FAP).  Under  both  hypoth^is  Hi 
{i  e  {0,1}),  Ti  can  be  normalized  by  Ti^i=\Ti  +  /Xj 
where  T*  is  distributed  according  to  a  non-central 
distribution  with  1  degree  of  freedom  and  non-central 
parameter  =  erf  (A’)Ai/((ro(iV)  -  cr1{N))^.  For  a 
fixed  FAP,  the  Probability  of  Detection  (PD)  is  then 
given  by: 

•  First  case:  cr^  > 

PD  =  ({AoxfN'^Cl  "  ~ 

(16) 

•  Second  case:  ctq  >  trf , 

PD  =  XiWi]  {{>^oXi[’^o]~\FAP)  +  Mo  -  Ml}  /Ai)^^ 

where  xlH(-)  and  xlWVK-)  denote  the  cumulative 
distribution  function  (cdf)  and  its  inverse  for  a  non¬ 
central  x^  distribution  with  d  degrees  of  fireedom  and 
non-central  parameter 

4  Cumulant-based  Detector  with  Un¬ 
known  Parameters  (“UP  Detector”) 

This  section  addresses  the  SE  process  detection 
problem  in  imsupervised  learning  (ARMA  and  noisy 
AR  parameters  unknown).  Assume  that  M  indepen¬ 
dent  realizations  of  Afc,  denoted 

available.  These  M  measurements  can  be  obtained 
from  one  single  signal  by  segmentation.  This  procedure 
consists  in  considering  an  AT-sample  signal  as  M  seg¬ 
ments  of  K  samples.  The  estimation  Akj  is  computed 
from  the  segment  samples.  Two  remarks  have  to 
be  pointed  up: 

•  the  segment  size  K  has  to  be  large  enough  to  ob¬ 
tain  approximately  normally  distributed  determinant 

•  the  slices  cannot  be  adjacent:  two  consecutive 
slices  have  to  be  sufficiently  separated  to  be  indepen¬ 
dent.  The  separation  between  two  segments  can  be 
computed  using  the  procedure  proposed  in  [2]. 

Note  that  the  previous  conditions  regarding  the 
segment  length  and  the  slice  separation  cannot  be 
satisfied  for  short  signals.  Consider  the  sequence 
( A^  of  M  independent  one-dimensional  nor- 

mal  variables.  Define  as  the  sample  variance  ot  the 


sequence  . 

It  is  well  known  that  T2  =  — — -  1  has  a 
Student  distribution  with  M  - 1  degrees  of  freedom  [7]. 
Consequently,  the  distribution  of  T2  imder  F^dbes  not 
depend  on  the  model  parameters  (since  E  Akj  =  0 
under  Ho).  The  detection  strategy  is  then  defined  by: 

Ho  rejected  if  r|  >  T2  (19) 

In  (19),  T2  is  a  threshold  independent  of  the  model 
parameters.  The  distribution  of  T2  imder  hypothesis 
Hi  is  unknown  (since  E  Ak,j  0)-  Consequently, 
the  theoretical  PD  corresponding  to  (19)  cannot  be 
obtained.  It  has  to  be  numerically  computed  using 
Monte-Carlo  simulations. 

Note  that  the  Generalized ^ikelihood  Ratio  detector 
based  on  the  observations  ^Afc,j  ^  leads  to  the 

same  test  statistic  T2  (see  appendix). 

5  Spectral  Equivalence  Model  Order 
Mismatch 

Assume  that  the  SE  model  order  has  been  overesti¬ 
mated.  Eq.  (5)  can  then  be  written: 


j,0,...,0) 

j-0 


0  ,  Vm  >  0 


wim  ^ 

r  a’j=aj  yje{l,...,p}  (21) 

\a;  =o  Vj  e  {p  +  l,...,p'} 

p'  is  the  overestimated  order  (p'  >  p).  Denote  Cj?  the 
vector  obtained  by  replacing  p  by  p'  in  (7).  Eq.  (20) 
then  yields: 

ApKC'fe")  =  det  (Rp-(C;«))  =  0  (22) 

Moreover,  it  is  still  legitimate  to  assume  that 
A  -(Cfe  )  ^  0.  Consequently,  the  hypothesis  testing 
problem  is  exactly  the  same  as  (11).  The  only  differ- 
ence  is  that  larger  cumulant  slices  are  involved.  As  a 
conclusion,  the  model  order  p  can  be  estimated  using 
a  conventional  technique  such  as  Akaike  s  [3].  A  cor¬ 
rected  estimate  greater  than  p'  will  be  prefered  in  case 
of  imcertainty. 
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6  Simulations  Results 


Many  simulations  have  been  performed  to  validate 
the  previous  theoretical  results.  In  this  paper,  the 
simulations  are  presented  for  an  AR(1)  process  with 
parameters  [1;  —0.5]  driven  by  a  zero  mean  i.i.d.  ex¬ 
ponentially  distributed  input  (with  variance  =  1). 
However,  other  model  orders  and  AR  parameters  have 
been  studied  and  give  similar  performance.  The  de¬ 
tector  performance  is  computed  as  a  function  of  the 
Signal-to-Noise  Ratio  defined  by  SNR  =  cr^/cr^.  The 
ARMA  model  parameters  (input  variance  cr^  and  pa¬ 
rameters  aj^bj)  are  related  to  the  noisy  AR  process  pa¬ 
rameters  by  a  one-to-one  transformation.  The  ARMA 
process  is  driven  by  a  zero  mean  i.i.d.  exponentially 
distributed  input.  Numerical  results  have  been  com¬ 
puted  with  2000  Monte-Carlo  simulations. 

Fig.  1  shows  the  KP  detector  ROC’s  for  different 
SNR's,  The  test  improves  when  the  SNR  decreases, 
in  this  particular  case.  The  noisy  AR  process  distri¬ 
bution  is  close  to  the  Gaussian  distribution,  when  the 
SNR  is  low,  contrary  to  the  SE  ARMA  process.  Thus, 
the  two  SE  processes  can  be  easily  distinguished  for  low 
SNR^s,  Fig.  2  presents  the  KP  detector  performance 
for  fixed  SNR  —  8dB  as  a  function  of  the  number  of 
samples  N.  Obviously,  the  higher  iV,  the  better  the 
detector  performance.  A  comparison  between  fig.’s  2 
and  3  shows  that  the  detector  performance  is  relatively 
similar  for  known  and  unknown  parameters.  Fig.  4 
presents  the  UK  detector  ROC’s  obtained  for  different 
overestimated  orders  p'  {SNR  =  8dB  and  N  =  500). 
Clearly,  the  higher  p',  the  better  the  detector  perfor¬ 
mance.  This  can  be  explained  as  follows.  First,  the 
model  order  mismatch  does  not  deteriorate  the  detec¬ 
tor  performance,  as  proved  in  section  5.  Second,  the 
number  of  cumulants  involved  in  the  test  function  in¬ 
creases  when  p'  increases.  Consequently,  the  higher 
p',  the  better  the  information  concerning  the  process 
structure. 


Fig.  2.  KP  Detector  ROC’s  for  different  number  of 
samples  {SNR  —  8dB). 


Fig.  3.  UP  Detector  ROC’s  for  different  number  of 
samples(5A/'i^  =  8dB). 


Fig.  4.  UK  Detector  ROC’s  for  different  values  of  the 
overestimated  order  p'  {SNR  —  8dB,  N  =  500). 

7  Summary  and  Conclusions 

Higher-order  statistics  were  shown  to  be  an  efficient 
tool  for  the  SE  process  detection.  A  likehhood  ratio 
test  based  on  higher-order  Yule- Walker  equations  was 
derived,  for  the  supervised  learning  case.  The  pro¬ 
posed  test  was  suboptimal  since  it  did  not  work  on 
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data  themselves.  However,  it  did  not  require  any  sta¬ 
tistical  assumption  for  the  driving  noise  and  showed 
good  performance.  The  basic  idea  of  this  detector  was 
to  test  the  nullity  of  the  higher-order  Yule-Walker  ma¬ 
trix  determinant.  The  detector  was  then  generalized 
to  the  imsupervised  learning  case,  where  no  a  priori 
information  was  available.  The  corresponding  detector 
showed  satisfying  performance.  Moreover,  the  detec¬ 
tor  was  shown  to  be  robust  concerning  the  model  order 
overestimation. 

This  work  can  be  extended  to  the  detection  of 
processes  whose  spectra  are  slighty  different.  These 
processes  cannot  be  easily  distinguished  using  tech¬ 
niques  based  on  second-order  statistics,  similarly  to  SE 
processes.  The  detection  may  be  achieved  using  higher- 
order  statistics.  Unfortunatley,  this  work  was  limited 
to  linear  processes.  The  detection  of  SE  processes 
which  cannot  be  modeled  by  linear  processes  is  cur¬ 
rently  imder  investigation. 

8  Appendix:  Generalized  Likelihood 
Ratio  Detector 


In  (25)  and  (26),  r's  is  a  threshold  depending  on 
the  false-alarm  probability  such  as  T3  >  1  in  or¬ 
der  to  avoid  to  reject  Ho  in  any  case  (Indeed, 


\2  M  ^  „ 

-Afe)  <E  Afc./,  GLR  >  1 

/  j=i 

M  ^ 

e  R^).  Moreover,  since  X)  Afe.j  = 


MAfc,  eq.  (26)  leads  to; 


M 


Ho  rejected  if  (1  —  (T3)  ^  Afe,j^  <  MAk  ,  i-e- 

(27) 

/ 


Afc 


>  (M-1)  (1  -  (r'3) 


(28) 

Consequently,  denoting  T3  =  (M  -  1)  ^1  -  (ts)  ) , 
eq.  (28)  is  identical  to  eq.  (19). 


Since  Al,  (r§(iV),  and  (T?(iV)  are  the  unknown  pa¬ 
rameters,  the  GLR  statistics  based  on  the  observations 

( Ak  i'^  is  defined  by; 


GLR 


sup  (a2)-'^/2  exp  f  2  (Ak,j  -  A)A  \ 

A^O  ;  g^>0 _ \  - 

sup  (cr2)-'-^/2exp 

cr2>0  \  ^  . 


(23) 

It  is  weU  known  that  the  hkehhood  function  suprema 
are  obtained  for: 


M 


A  =  Afc  =  E  Afe 


j=l 


:  ..f  2 


M 


E  A./ 

j=l 


under  Ho 


The  GLR  test  for  deciding  between  Ho  and  Hi  then 
reduces  to: 


'"e  (Aw-Afe) 

M  ~ 

,  E  Afe,/ 

\  J=1 


Ho  rejected  if  GLR  = 


-M/2 


J 


i.e.  V(S.,, W 


M 

E 

i=i 


>  Tg  , 


(25) 
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Abstract 

Characterizing  signals  jointly  in  the  time  andfrequen^ 
cy  domains  through  time-frequency  representations  (TFRs) 
such  as  the  Wigner-Ville  Distribution  (WVD)  is  a  natural 
extension  of  Fourier  analysis  and  gives  a  more  complete 
representation  of  signal  behavior,  particularly  in  the  case  of 
non-stationary  signals.  In  the  presence  of  additive  impul¬ 
sive  noise,  TFRs  quickly  break  down  and  any  information 
about  the  desired  signal  is  lost.  To  combat  these  effects,  we 
propose  in  this  paper  a  family  of  memoryless  nonlinearities 
which  have  been  shown  to  produce  a  signal  autocorrela¬ 
tion  statistic  which  is  well-behaved  in  the  presence  of  stable 
noise.  The  result  of  this  approach  is  a  TFR  which  is  both 
robust  and  simple  to  implement,  and  has  many  of  the  math¬ 
ematical  properties  associated  with  the  standard  WVD.  We 
illustrate  the  improvement  in  performance  that  can  be  ob¬ 
tained  with  several  examples. 


1.  Introduction 

Time-frequency  representations  (TFRs)  are  useful  tool- 
s  for  characterizing  the  behavior  of  signals  whose  spectral 
characteristics  are  time-varying.  Because  of  the  ease  with 
which  they  can  be  implemented,  they  have  been  applied  to 
a  wide  variety  of  problems  in  signal  processing.  Most  no¬ 
tably,  they  have  been  used  for  signal  recovery  at  low  signal- 
to-noise  ratios  (SNRs),  accurate  estimation  of  instantaneous 
frequency  and  group  delay,  signal  detection  in  communi¬ 
cations,  radar,  and  sonar  applications,  and  the  design  of 
time- varying  signals  and  filters. 

The  Wigner-Ville  Distribution  (WVD)  is  one  of  the  most 
commonly-used  quadratic  TFRs  (QTFRs)  and  has  the  form 

w^{t,  f)  =  +  r/2)x*it  -  r/2)e--''2"/^  dr.  (1) 

*  This  work  was  supported  in  part  by  the  National  Science  Foundation 
under  grant  MIP-9530923. 


where  /^(•)  dr  denotes  integration  with  respect  to  u  from 
—00  to  00.  The  WVD  is  a  particularly  useful  representation 
of  the  behavior  of  x{t)  because  it  is  real  everywhere  and 
so  acts  as  a  density  function  describing  the  concentration 
of  the  energy  of  x{t)  over  the  time-frequency  plane.  This 
interpretation  is  justified  in  part  by  the  fact  that  the  signal 
energy,  Ex,  may  be  obtained  by  taking 

E^  =  JJw^it,f)dtdf,  (2) 

and  marginal  distributions  of  the  signals  energy  concentra¬ 
tion  in  time  or  frequency  alone  are  respectively  taking  by 
integrating  the  WVD  with  respect  to  the  opposing  variable. 
It  has  many  other  desirable  mathematical  properties,  which 
are  discussed  in  detail  in  [5],  [6],  and  [10].  The  discrete-time 
WVD  is 

Wx{n,F)  =  2y^x{n-\-7n)x*{n  —  ,  (3) 

m 

and  has  a  periodicity  of  1/2  along  the  normalized  frequency 
axis.  As  a  result  of  this  effect,  which  has  been  studied  in  [3] 
and  [4]  in  detail,  it  is  necessary  to  either  sample  the  signal 
at  twice  its  Nyquist  rate  or  use  analytic  signals  sampled  at 
the  Nyquist  rate  in  order  to  avoid  aliasing. 

The  WVD  can  be  used  to  analyze  signals  that  are  con¬ 
taminated  by  noise  that  is  Gaussian.  However,  there  are  a 
number  of  situations,  e.g.  sonar  or  synthetic  aperture  radar, 
where  the  additive  channel  noise  is  characterized  by  large 
numbers  of  impulses.  In  this  case,  the  WVD  can  be  severely 
degraded,  as  large  numbers  of  artifacts  can  be  produced  by 
additive  impulses. 

In  this  paper,  we  propose  a  modification  of  the  WVD 
which  yields  a  more  robust  QTro.  This  approach  is  based 
on  the  interpretation  of  the  WVD  as  a  time-varying  power 
spectral  density.  By  replacing  the  autocorrelation  function, 
which  has  been  shown  to  be  severely  affected  by  impul¬ 
sive  noise,  with  a  more  robust  statistic,  we  derive  a  QTFR 
that  also  exhibits  significant  improvement  over  comparable 
QTFRs. 
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The  organization  of  this  paper  is  as  follows.  Models  for 
impulsive  noise  are  introduced  in  Section  2,  and  are  accom¬ 
panied  by  a  discussion  of  the  effect  of  impulses  on  WVDs. 

A  robust  WVD  using  fractional  lower  order  statistics  is  then 
introduced  in  Section  3,  and  its  properties  are  examined. 
Examples  which  illustrate  the  improvement  in  performance 
which  can  be  obtained  using  fractional  lower-order  moments 
are  presented  in  Section  4,  and  a  summary  of  the  work  is 
given  in  Section  5, 

2.  Impulsive  Noise  and  Its  Effect  on  TFRs 

Impulsive  noise  arises  in  a  variety  of  situations,  such  as 
atmospheric  noise,  as  discussed  by  Middleton  [8],  and  in 
radar  and  sonar  applications.  A  key  feature  of  all  impulsive 
noise  models  is  that  the  tails  of  their  associated  density 
functions  decay  at  a  rate  that  is  slower  than  the  rate  of  decay 
of  the  tails  of  a  Gaussian  density  function. 

2.1.  a-Stable  Noise  Distributions 

The  family  of  a-stable  distributions  is  defined  by  its 
members’  characteristic  functions,  which  for  a-stable  dis¬ 
tributions  that  are  symmetric  about  their  location,  n,  have 
the  form 

(/5(w)  =  (4) 

where  7  is  the  dispersion  and  a  is  the  stability  parameter. 
The  most  well-known  members  of  the  family  of  symmetric 
a-stable  (SaS)  densities  are  the  Gaussian  density,  which  we 
obtain  when  a  =  2  and  the  Cauchy  density,  which  arises 
when  a  =  1.  These  are  the  only  SaS  density  functions 
which  can  be  obtained  from  (4)  in  closed  form.  The  stability 
parameter  has  the  most  impact  on  the  shape  of  these  density 
functions  because  it  controls  the  amount  of  mass  in  the  tails. 
As  a  decreases  from  2  to  0,  we  observe  an  increase  in  *e 
amount  of  the  total  mass  that  lies  in  the  tails,  corresponding 
to  an  increase  in  impulsiveness,  a-stable  distributions  are 
described  in  more  detail  in  [9]  and  [11]. 

2.2.  Effects  of  Impulses  on  TFRs 

Impulsive  noise  profoundly  degrades  TFRs,  although  the 
type  of  degradation  is  dependent  on  the  type  of  TFR  being 
used.  In  the  analysis  that  follows,  we  will  employ  a  highly 
simplified  model  of  a  signal  corrupted  by  impulsive  noise 
which  still  accurately  predicts  the  effect  of  strong  impulses 
on  the  TFR  of  the  signal  being  observed.  We  will  do  our 
modeling  in  continuous  time,  but  analogous  effects  can  be 
observed  in  discrete-time  signals  in  impulsive  noise.  The 
signal  r{t)  is  modeled  as  a  finite-duration  signal  which  is 
corrupted  by  a  single  Dirac  delta  function,  occurring  at  some 


random  time  tg  uniformly  distributed  over  the  signal’s  time 
support,  so  that 

r{t)  =  x{t)  +  S{t  -  tg).  (5) 

TheWVDofr(f)  is 

Writ,  f)  =  Wxit,  f)  +  2Re[a:(2f  -  t^)]  +  Sit  -  tg)  (6) 

where  Re[z]  is  the  real  part  of  z.  Here  we  have  multiple  un¬ 
desired  artifacts  due  to  the  presence  of  the  impulse.  We  first 
have  the  WVD  of  the  impulse  itself,  which  is  an  impulsive 
“ridge”  att  =  tg  that  occupies  all  frequencies.  In  addition, 
we  have  cross  terms,  which  arise  because  of  the  quadratic 
structure  of  the  WVD.  The  cross  term  is  a  scaling  of  the  real 
part  of  the  signal  xit)  and  the  which  is  shifted  in  time  by  tg. 
The  cross-terms  extend  over  all  frequencies,  and  therefore 
often  overlap  the  WVD  of  a:(t) .  It  is  because  of  this  overlap 
that  one  often  cannot  apply  smoothing  functions,  which  are 
used  to  attenuate  cross-terms  in  WVD  and  are  described  in 
[5],  [6],  and  [10].  Smoothing  functions  are  effective  only  on 
cross  terms  that  are  separated  from  the  WVD  of  the  desired 
signal  in  time  and  frequency,  which  is  often  not  the  case  for 
cross  terms  arising  from  impulsive  noise  effects. 

3.  The  Fractional  Lower-Order  WVD 
(FLOWVD) 

We  begin  by  considering  the  WVD  as  a  tool  for  char¬ 
acterizing  the  spectral  properties  of  non-stationary  random 
processes.  Since  the  WVD  is  formed  by  taking  the  Fourier 
transform  of  the  autocorrelation  of  a  signal  xit)  with  respect 
to  the  delay  variable  r,  we  can  write  the  evolutive  spectrum 
of  a  random  process  [I],  as 

SSJ)  =  j^E{xit+r/2)x*it-T/2)}e-^^^^^  dr.  (7) 

By  interchanging  the  integration  and  expectation  operators, 
we  see  that  the  evolutive  spectrum  can  be  interpreted  as  the 
expectation  of  all  the  WVDs  corresponding  to  their  respec¬ 
tive  members  of  the  ensemble  of  random  processes  {xit)}, 

SAt,f)  =  E{WAt,f)}-  (8) 

In  the  presence  of  a-stable  noise,  the  higher-order  mo¬ 
ments  of  signals  become  unbounded.  This  makes  second- 
order  measures  such  as  the  autocorrelation  function  useless 
for  examining  a-stable  signals.  It  has  been  shown  in  [7] 
that  the  variance  of  the  autocorrelation  becomes  infinite  for 
a  <  2  and  the  mean  of  the  autocorrelation  is  infinite  for 
a  <  1.  The  solution  to  this  problem,  introduced  in  [7], 
was  to  define  &  fractional  lower-order  covariance  (FLOG) 
measure,  which  has  the  form 

FLOC“(f,r)  =  +  Tl2)x~^°'\t  -  r/2)},  (9) 
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where  (•)<“>  is  the  a*-order  phased  fractional  lower-order 
moment  (PFLOM)  operator,  where  0  <  a  <  1.  It  is  defined 
as 

z<«)  =  |z|“+7z*.  (10) 

If  we  write  z  in  polar  coordinates,  z  =  re^^,  then  it  is  easy  to 
show  that  z<“>  =  so  that  the  PFLOM  acts  only  on  the 
magnitude  of  its  operand  and  preserves  its  phase.  Also,  in 
(9),  we  have  defined  z~^“>  =  (z*)^“^  =  (z<“7*-  =  L 

the  FLOC  becomes  the  autocorrelation  function.  If  a  =  0, 
all  amplitude  information  is  removed;  this  is  known  as  the 
phased  FLOC.  Ma  and  Nikias  showed  in  [7]  that  the  FLOC 
has  finite  mean  and  variance  if  0  <  a  <  a/2. 

We  can  define  a  fractional  lower-order  evolutive  spectrum 
by  taking  the  Fourier  transform  of  the  FLOC;  this  gives  us 

u.  " 

which  can  be  written  as  the  expectation  of  an  ensemble  of 
TFRs  of  the  form 


f)  =  j  -  r/2)e-j2”/’'  dr. 

u-  .  "  (^2) 

which  we  define  to  be  the  fractional  lower-order  Wigner- 
Ville  distribution  (FLOWVD)  of  (t).  The  discrete-time 
version  of  the  FLOWVD  is 


F)  =  2  ^  a;<“>(n  -I-  m)a:“<“>(n  - 

m 

(13) 

and  we  can  directly  define  a  general  fractional  lower-order 
TFR  in  a  fashion  analogous  to  that  for  Cohen’s  class  of 
energetic  QTFRs  [5]  as 


4“^  it,  f)  =  it,  r)  H  Ht,  dr,  (14) 

where  Ki°'^  (t,  t)  is  the  autocorrelation  of  (f),  <^{t,  t)  is 
the  smoothing  kernel  and  *t  denotes  convolution  in  the  time 
domain. 

The  FLOWVD  is  itself  a  standard  WVD,  where  the  input 
signal  is  a; <“>(”)  rather  than  x{n).  As  a  result  of  this,  it 
is  easy  to  show  that  the  FLOWVD  has  all  the  properties 
that  the  WVD  itself  has.  The  FLOWVD  can,  for  example, 
be  shown  to  be  real  and  time-frequency  shift-invariant.  It 
also  obeys  the  marginal  properties,  although  the  marginal 
energy  densities  are  associated  with  not  x{n).  For 

instance,  if  we  examine  the  continuous  time  FLOWVD,  we 
can  show  that  the  marginal  energy  density  with  respect  to 
time  is 

fw^''Ht,f)df  =  x<“>(f)x-<“>(f)  (15) 

^  f 


In  addition,  the  fractional  lower-order  ambiguity  func¬ 
tion  (FLOAF),  defined  in  [7],  can  be  obtained  from  the 
FLOWVD  by  applying  a  2-D  Fourier  transform  to  (12)  in 
the  same  way  that  the  symmetric  ambiguity  function  is  ob¬ 
tained  from  the  WVD. 

4.  Results 


In  this  section  we  demonstrate  the  performance  gains 
that  can  be  attained  by  employing  fractional  lower-order 
moments  to  signals  prior  to  computing  the  WVD.  The  signal 
we  will  use  is  a  sampled  echolocation  signal  from  the  Large 
Brown  Bat,  Eptesicus  Fuscus,  where  the  sampling  period 
was  6.25  /isec.  (The  authors  thank  C.  Condon,  K.  White, 
and  Prof.  A.  Feng  of  the  Beckman  Institute  of  the  University 
of  Illinois  for  the  bat  data  and  for  permission  to  use  it  in  this 
paper.)  A  time  history  of  the  signal  is  given  in  Fig.  1(a) 
and  the  WVD  of  the  signal  is  shown  in  Fig.  1(b),  where  we 
have  used  the  Choi-Williams  smoothing  kernel,  which  has 
the  form 

<&(n,  m)  =  g-o-nV4m^  ^ 

with  cr  =  15.  The  signal  is  composed  of  two  prominent 
nonlinear  FM  chirps,  each  approximately  1  msec  in  length. 
To  avoid  aliasing  effects,  we  have  created  an  analytic  signal 
by  adding  the  echolocation  signal’s  Hilbert  transform  to  it. 


xlO“^ 


Figure  1.  (a):  Time  history  of  bat  echolo¬ 
cation  pulse,  (b):  Wigner-Ville  Distribution 
of  echolocation  pulse  with  Choi-Williams  s- 
moothing  kernel. 


To  the  echolocation  signal  in  Fig.  1  we  have  added  t- 
wo  types  of  isotropic  a-stable  noise;  we  used  Cauchy  noise 
a  =  1  and  a-stable  noise  with  a  =  1.5.  The  WVDs  of 
both  corrupted  signals  are  shown  in  Fig.  2.  In  both  plots 
the  interference  terms  discussed  earlier  are  readily  apparent, 
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particularly  the  strong  ridges  located  in  time  at  the  loca¬ 
tion  of  impulse  events.  We  can  also  see  intermodulation 
effects  between  the  impulsive  ridges,  which  arise  due  to 
the  quadratic  structure  of  the  WVD.  Note  the  lack  of  ef¬ 
fect  the  Choi-Williams  kernel,  also  applied  here,  has  on  the 
impulse-related  artifacts. 


Figure  2.  WVDs  of  noisy  echolocation  signai 
in  stable  noise  with  (a):  a -I,  (b):  a  =  1.5. 

In  Fig.  3,  we  show  the  WVDs  that  are  produced  when 
PFLOM-type  influence  functions  are  used.  In  Fig.  3(a)  we 
have  used  a  =  0,1  for  the  Cauchy  noise,  and  we  have  used 
a  =  0.25  for  the  a  =  1.5  noise,  in  Fig.  3(b).  We  then  ap¬ 
plied  the  same  Choi-Williams  kernel  to  both  representations. 


works  well  if  the  desired  signal  does  not  have  significant  en¬ 
ergy  in  the  frequencies  near  its  Nyquist  frequency,  and  if 
the  additive  impulses  are  not  dense.  As  an  example  of  the 
performance  degradation  that  a  median  pre-filter  can  suf¬ 
fer  in  an  impulse-heavy  environment,  we  consider  the  case 
where  we  the  echolocation  signal  is  contaminated  by  addi¬ 
tive  Cauchy  noise.  In  Fig.  4,  we  show  the  results  of  median 
pre-filtering.  A  length-3  pre-filter  was  used  in  Fig.  4(a); 
here  the  high  density  of  impulses  results  in  many  outliers 
remaining  in  the  signal  after  processing.  These  unwanted 
impulses  can  be  removed  by  using  a  wider  filter,  as  shown 
in  Fig.  4(b),  but  this  causes  many  signal  details  to  be  lost. 
The  end  result  is  that  in  Cauchy  noise,  the  median  filter  does 
not  yield  satisfactory  results. 


Figure  4.  WVD  of  echolocation  signal  in 
Cauchy  noise  via  prefiltering  with  (a)  length-3 
median  filter,  (b)  length-5  median  filter. 


Figure  3.  FLOWVDs  of  noisy  echolocation  sig¬ 
nal  in  stable  noise  with  (a):  a  =  1,  (b):  a  =  1.5. 


For  some  applications  involving  impulsive  noise,  apply¬ 
ing  a  median  filter  to  the  signal  before  computing  its  WVD  is 
a  useful  way  to  remove  the  effect  of  impulses.  This  approach 


To  demonstrate  the  effect  that  fractional  lower-order  mo¬ 
ments  have  on  the  WVD  of  an  impulsive  signal,  we  apply  the 
signal  reconstruction  algorithm  developed  by  Boudreaux- 
Bartels  and  Parks  in  [2].  We  first  isolate  the  second  chirp  in 
the  clean  signal  shown  in  Fig.  1  (a)  using  a  masking  function, 
which  sets  all  values  of  the  WVD  outside  a  desired  region  (in 
this  case,  a  trapezoid)  to  zero.  The  isolated  chirp  is  shown 
in  Fig.  5(a).  The  reconstruction  algorithm  is  then  applied 
to  the  masked  WVD,  and  the  resulting  nonlinear  FM  signal 
is  shown  in  Fig.  5(b).  By  applying  the  same  mask  to  the 
FLOWVD  of  the  echolocation  signal  in  Fig.  3(a),  we  obtain 
the  TFR  in  Fig.  5(c).  If  we  apply  the  algorithm  from  [2] 
to  this  TFR,  we  obtain  a  reconstruction  of  part  of  a;<“'(f). 
To  obtain  the  chirp  from  x{t)  itself,  it  is  necessary  to  undo 
the  nonlinearity  by  applying  a  PFLOM  of  order  1/a  to  the 
output  of  the  reconstruction  algorithm;  the  result  is  shown  in 
Fig.  5(d).  Note  that  applying  this  second  nonlinearity  tends 
to  emphasize  some  of  the  variations  that  often  arise  in  the 
envelope  of  the  reconstructed  signal. 
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Figure  5.  (a):  Masked  WVD  of  clean  echolo- 
cation  signal,  (b):  Reconstructed  chirp  from 
masked  WVD.  (c):  Masked  FLOWVD  of  e- 
cholocation  signal  in  Cauchy  noise,  (d):  Re¬ 
constructed  chirp  from  masked  FLOWVD. 


5.  Summary 


In  this  paper,  we  have  considered  the  effect  of  additive  im¬ 
pulsive  noise  modeled  by  a-stable  distributions  on  quadrat¬ 
ic  time-frequency  representations  such  as  the  Wigner-Ville 
Distribution.  We  have  shown  that  the  presence  of  impulses 
in  additive  noise  produces  several  undesirable  effects.  These 
effects  include  the  generation  of  time-scaled  replicas  of  the 
desired  signal  as  well  as  strong  ridges  extending  over  all 
frequencies  at  the  time  location  of  outliers. 

To  combat  these  effects,  we  introduced  a  robust  evolutive 
spectrum,  based  on  the  fractional  lower-order  covariance. 
From  this,  we  were  able  to  define  the  FLOWVD,  which  has 
many  of  the  desirable  mathematical  properties  associated 
with  the  standard  WVD. 

To  demonstrate  the  performance  improvement  that  can 
be  realized  with  such  a  technique,  we  examined  a  bat  e- 
cholocation  signal  which  we  contaminated  with  two  differ¬ 
ent  a-stable  noise  types.  We  showed  that  the  standard  WVD 
breaks  down  completely  in  the  presence  of  such  noise,  and 
we  showed  that  techniques  such  as  median  pre-filtering  do 
not  always  produce  good  results.  We  then  examined  the 
FLOWVD  for  these  noise  types  and  showed  that  it  was 
more  robust  than  the  WVD.  Further,  we  were  able  to  isolate 
components  of  the  original  signal  by  masking  the  FLOWVD 
and  applying  standard  reconstruction  algorithms. 
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Application  of  the  Positive  Alpha-Stable  Distribution 
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Abstract 

Many  physical  phenomena  are  non-Gaussian  and  if  the 
observed  data  have  frequently  occurring  extreme  values,  then 
the  phenomena  may  be  modeled  as  a  random  process  with 
an  alpha-stable  distribution.  When  positive  and  negative 
outcomes  are  equally  likely,  then  the  process  would  be 
symmetric  alpha-stable  ( So£);  however,  when  only  positive 
outcomes  are  possible,  then  the  process  would  be  positive 
alpha-stable  (  PaS ).  Phenomena  related  to  energy  or  power 
are  examples.  This  paper  presents  the  characteristics  and 
potential  applications  for  the  PaS  distribution.  For  this 
distribution  all  negative-order  moments  exist,  and  ratios  of 
these  moments  are  used  to  estimate  alpha.  Application  areas 
that  will  be  examined  include:  seismic  activity,  ocean  wave 
variability,  and  radar  sea  clutter  modulation.  The  correlation 
properties  of  these  data  are  examined. 


1.  Introduction 

The  alpha-stable  distribution  is  an  important  area  for 
investigation  because  this  distribution  is  expected  from 
superposition  in  natural  processes.  The  parameter  alpha,  oc , 
is  the  characteristic  exponent  that  varies  over  0  <  Of  ^  2 .  The 
distribution  includes  the  Gaussian  when  ot  =  2 .  For  cc  less 
than  two,  the  distribution  becomes  more  impulsive,  more 
non-Gaussian  in  nature,  and  the  tails  of  the  distribution 
become  thicker.  This  makes  the  alpha-stable  distribution  an 
attractive  choice  for  modeling  signals  and  noise  having  an 
impulsive  nature.  Also,  from  the  generalized  central  limit 
theorem,  the  stable  distribution  is  the  only  limiting 
distribution  for  sums  of  independent  and  identically 
distributed  (IID)  random  variables  (stability  property).  If  the 
individual  distributions  have  finite  variance,  then  the  limiting 
distribution  is  Gaussian.  For  a  less  than  two,  the  individual 
distributions  have  infinite  variance.  For  detailed  information, 
see  the  books  by  Nikias  and  Shao  [1]  and  Samorodnitsky 
and  Taqqu  [2]. 

Sources  that  could  follow  or  be  modeled  by  the  alpha- 
stable  distribution  are  abundant  and  include  lightning  in  the 
atmosphere,  switching  transients  in  power  lines,  static  in 


telephone  lines,  seismic  activity,  climatology  and  weather, 
ocean  wave  variability,  surface  texture,  the  slamming  of  a 
ship  hull  in  a  seaway,  acoustic  emissions  from  cracks  growing 
in  engineering  materials  under  stress,  etc.  Many  sources  can 
exist  in  the  area  of  target  and  background  signatures  that 
affect  detection  and  classification.  In  underwater  acoustics, 
examples  of  these  sources  could  include  interference  to  target 
detection  such  as  ice  cracking,  biologies,  bottom  and  sea 
clutter  in  active  sonar,  ocean  waves  near  the  surface  and  in 
the  surf  zone.  They  could  also  include  target  characteristics 
such  as  target  strength  in  active  sonar  and  cavitation.  Similar 
sources  in  radar  and  infrared  can  include:  ocean  waves  in 
the  form  of  sea  clutter  and  radar  cross  section  (RCS);  see 
Pierce  [3]. 

This  paper  will  examine  characteristics  and  applications 
of  the  positive  alpha-stable  (PaS)  distribution.  This 
distribution  is  related  to  the  symmetric  alpha-stable  (SaS) 
distribution  in  that  every  SaS  random  variable  can  be 
represented  as  the  product  (or  modulation)  of  a  zero  mean 
Gaussian  random  variable  and  the  square-root  of  a  PaS 
random  variable  where  the  alpha  parameter  for  the  PaS 
distribution  is  half  the  alpha  associated  with  the  SetS 
distribution  [2].  The  PaS  alpha  varies  over  0  <  a  <  1 .  The 
Gaussian  and  PaS  distributions  must  be  statistically 
independent.  This  property  is  regularly  used  for  the  computer 
generation  of  SaS  random  variables.  From  this  property  and 
the  stability  property,  the  estimation  of  signal  power  (second 
moment  estimates)  from  short  averages  of  SaS  random 
variables  approaches  a  PaS  random  variable  where  the  alpha 
is  unchanged  by  averaging.  The  PaS  random  process  could 
be  observed  when  energy  or  power  varies  over  time  or  space. 

The  general  theory  and  methods  for  PaS  processes  are 
developed  and  shown  where  they  can  be  related  to  power  or 
energy  flow  in  physical  processes.  This  includes  derivation 
of  the  moments  for  this  distribution,  the  range  of  support  for 
these  moments  (all  negative  order  moments  exist)  and  the 
statistical  error  associated  with  estimating  these  moments. 
A  method  for  estimating  alpha  is  presented  which  uses  the 
ratio  of  negative  order  moments.  A  method  is  presented  for 
estimating  the  correlation  between  samples  from  PaS 
processes.  Results  are  presented  using  these  methods  with 
actual  data. 
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2,  Positive  alpha-stable  distribution 


The  positive  alpha-stable  distribution  has  the  characteristic 
function 

^>W  =  exp(-7M“[l  +  isgn(0tan(o3r/2)])  (1) 

where 


(  1.  t>0 

sgn(/)  =  ^  0,  (2) 

[-1,  t<0 

and 

0<a<l,  7>0  (3) 

The  general  characteristic  function  is  from  [1]  where  the 
location  parameter  is  zero  (^  =  0)  and  the  symmetry 
parameter  is  one  ( =  1).  The  characteristic  exponent  is  a , 
and  the  dispersion  or  scale  parameter  is  y . 

Closed-form  expressions  do  not  exist  for  the  PaS 
probability  density  function  (PDF)  except  for  the  Pearson 
(or  L^vy)  distribution  (a  =  0.5).  Fig.  1  presents  a  family  of 
PocS  PDFs  obtained  by  numerically  taking  the  inverse 
Fourier  transform  of  the  characteristic  function  given  in  (1). 
Alpha  varies  from  0.3  to  0.95  in  steps  of  0.05.  As  alpha 
approaches  one,  the  density  function  becomes  more  and  more 
centered  about  the  normalized  amplitude  of  one.  The  tails 
show  the  power-law  or  algebraic  asymptote  that  is  a 
characteristic  of  all  alpha-stable  distributions.  The  amplitude 
in  Fig.  1  is  normalized  by  the  negative  first-order  moment; 
the  topic  of  moments  is  discussed  next. 

The  moments  for  the  PoS  distribution  are  defined  by  the 
expectation 

mp  =  l^xP'\  (4) 

where  jc  is  a  PoS  random  variable,  and  for  sampled  data, 

,  the  moment  is  estimated  by 

1  ^ 

(5) 

n=l 

One  can  derive  directly  or  from  [2]  determine  the  moments 
for  0  <  p  <  a 


m„  =  y« 


sin(;g?)  r(l  +  ;>) 


asm 


a 


1 4*  tan^ 


(fl 


^p- 

2a 


(6) 


Although  negative  moments  are  not  considered  in  [2],  the 
negative  order  moments  of  all  orders  (-oo<p<0)are 


m 


a  r{-p) 


(7) 


In  this  paper,  the  negative  first-order  moment  is  used  for 
normalization,  such  that  the  product  xm_i  has  a  negative 
first-order  moment  of  one  and  dispersion 


Fig.  1.  Probability  density  function  for  PoS 
distribution  from  a =0.3  to  a =0.95 


.j'rCVa)^" 


(8) 


Taking  the  ratio  of  moments  gives  a  method  for  estimating 
alpha.  In  this  paper,  the  selected  ratio  is 


m^.5  r^(l/2«)  1 
#n_j  r(l/a)  na 

The  moments  are  estimated  using  (5),  and  the  inverse  of  (9) 
is  approximated  over  the  range  0.3  :<  a{r)  <  1  as  the 
polynomial 


a(r)  =  -29.8573r*  + 1 35.68  Ir’  -254.293r® 

+  255.965r^  - 149.978/  +  52.2708r^  (10) 

-10.4879/  H-1.60231/--H  0.0965472 


The  random  error  associated  with  moment  estimation  is 
derived  as 


VAI^mp'\  =  j^{m2p-ml)  (11) 

TTie  p*  root  is  taken  to  scale  the  moment  estimate  such  that 
the  result  is  proportional  to  amplitude.  For  large  sample  size, 
the  approximate  variance  is 


1  1  K-"?) 

P  p'^  N 


The  normalized  error  becomes 


(12) 


/?[;i 


VAmrn 


,1/p 


m 


Vp 


1  -jmzp-ml 
\P\  fPp 


(13) 


Plots  of  (13)  for  a  range  of  alpha  are  given  in  Fig.  2.  For 
moments  with  order  less  than  -0.5  and  alpha  greater  than 
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Fig.  2.  Normalized  error  for  moments  of  PcS 
distribution  for  a=0.1  to  a=0.95 


0.4,  the  error  is  determined  more  by  alpha  than  by  the  order 
of  the  moment.  The  use  of  -0.5  and  -1  order  moments  for 
alpha  estimation  and  normalization  appears  to  be  reasonable 
for  most  applications. 

The  usual  moments  of  mean  and  mean-square  are  infinite, 
and  estimates  of  these  moments  are  not  consistent.  The 
median  of  the  PaS  distribution,  however,  is  also  a  measure 
of  the  distribution's  scale.  The  relationship  between  median 
and  the  moments  is  established  by  using  PDF  calculations 
and  numerical  integration  (not  shown). 

Autoregressive  and  moving  average  models  of  PocS 
processes  must  have  positive  or  zero  coefficients.  Otherwise, 
the  sum  would  not  be  PcxS  distributed  (stability  property). 

From  [2],  the  SaS  random  variable  is  related  to  the  PaS 
random  variable.  If  g  is  a  zero  mean  Gaussian  random 
variable  and  jc  is  a  PaS  random  variable,  where  g  and  x 
are  statistically  independent,  then 

z  =  x^l^g  (14) 

is  a  SaS  random  variable  with  characteristic  exponent 
^SaS  ~  ^^PaS 

This  property  shows  that  every  SaS  random  variable  is 
conditionally  Gaussian  [2]. 

For  some  applications,  the  power  estimated  from  2i  SaS 
distributed  random  process  is  the  random  variable  of  interest. 
Power  or  second  moment  estimates  of  can  be  viewed  as 
PaS  random  variables  when  estimated  from  a  sufficiently 
large  number  of  data  samples,  N .  Using  (14)  the  power  or 
second  moment  estimate  is 


.  N  .  N 

z,2  -  T7  M  lu^nSn 


and  the  moment  conditional  on  a  set  of  is 


Number  of  independent  data  samples,  N 

Fig.  3.  PoS  alpha  calculated  from  multiple  second 
moment  estimates  for  various  ScS  alpha 

1  ^ 

^  n=l 

where  CT^  is  the  variance  of  .  As  each  set  of  x^  is  drawn 
randomly,  the  sum  of  N  x^  samples  is  itself  PckS  distributed 
(stability  property).  In  practice,  N  must  be  large  enough  to 
estimate  <T^.  Fig.  3  gives  an  indication  of  the  number  of 
independent  samples  necessary  to  produce  second  moment 
estimates  that  are  PcxS  distributed.  At  each  N  and  a^^^ , 
1000  second  order  moments  were  estimated  using  (16)  with 
the  ScxS  independent  samples  generated  using  the  computer 
algorithm  given  in  [2].  The  ap^^s  was  then  calculated  from 
these  power  estimates  using  the  moment  ratio  method.  For 
example,  it  takes  about  32  independent  samples  at  an 
a  SaS  =  1.2  to  give  an  apos-  0^6  as  required  by  (15).  As 
alpha  increases,  more  samples  are  required. 

A  method  for  investigating  the  correlation  between  PcxS 
random  variables  is  suggested  by  the  negative  order 
moments.  The  cross  correlation  coefficient  estimated 
between  the  two  random  variables,  x  and  y ,  is  defined  as 


One  can  easily  show  that  if  x  and  y  are  uncorrelated,  then 
is  zero  (The  reverse  does  not  follow.)  And  if  x  and  y 
are  perfectly  correlated,  then  is  one.  This  method  can 
be  viewed  as  the  usual  correlation  definition  with  a  1/x 
transformation.  Other  characteristics  and  the  full  usefulness 
of  this  coefficient  are  being  investigated.  It  is  applied  to  the 
data  in  the  applications. 
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Fig.  4.  Comparison  of  PDF  for  alpha=0.6  with  PDF 
estimated  from  histogram  of  H-poI  sea  clutter 

3.  Applications 

For  the  phenomenology  of  sea  clutter,  the  positive  alpha- 
stable  distribution  is  shown  to  closely  model  the  modulation 
or  signal  power  flow  in  sea  clutter  samples.  An  example 
comparing  a  sea  clutter  histogram  to  a  PaS  PDF  is  given  in 
Fig.  4.  An  X-band  radar  was  used  to  collect  the  H-pol  clutter 
data.  The  data  were  taken  in  a  low  sea  state  2  at  a  0.9°  grazing 
angle,  0.5  nautical  miles  offshore  looking  into  the  wind.  The 
downrange  spatial  resolution  is  2  meters,  and  the  crossrange 
resolution  is  35.6  meters.  The  square  of  the  I  and  Q  samples 
were  averaged  over  0.2  seconds  to  produce  the  sea  clutter 
power  estimates  that  are  compared  to  the  PaS  distribution 
with  an  alpha  of  0.6.  The  ccpQ^  estimated  using  the  moment 
ratio  method  is  0.612.  This  compares  well  to  the  envelope 
results  given  in  [3]  where  an  ag^  of  1.3  is  estimated. 

The  correlation  between  signal  power  flow  in  sea  clutter 
samples  is  shown  in  Fig.  5.  The  plot  that  peaks  at  zero  time 
delay  is  the  auto  correlation.  Since  radar  returns  are  available 
from  22  downrange  cells  spaced  every  two  meters,  cross 
correlation  estimates  are  possible.  The  other  10  plots 
superimposed  on  Fig.  5  represent  the  cross  correlation  as 
the  channel  pairs  are  separated  by  an  additional  2  meters  for 
each  plot.  The  negative  coefficient  values  may  be  artifacts. 
A  small  amount  of  correlated  noise  appears  to  be  present  at 
zero  time  delay.  The  auto  correlation  decorrelates  in  about  2 
to  3  seconds.  The  clutter  data  decorrelates  significantly  but 
is  still  well  correlated  between  samples  taken  20  meters  apart. 

Using  data  available  on  the  World  Wide  Web  as  examples, 
the  PaS  distribution  is  shown  to  be  a  good  candidate  for 
modeling  the  variabilities  in  long  term  energy  measurements 
for  seismic  activity  (apctS*  =  0*564,  from  moment  ratio 
method)  and  ocean  waves  =0.731,  from  moment 


Time  delay  (sec) 

Fig.  5.  Auto  and  cross  correlation  coefficients  for 
H-pol  sea  clutter  data  sampled  every  two  meters 

ratio  method).  The  seismic  data  is  from  a  catalog  of  events 
presented  by  the  Southern  California  Seismographic  Network 
(SCSN)  at  http://scec.gps.caltech.edu:80/ftp/catalogs/SCSN. 
The  Richter  magnitude  of  the  event  is  converted  into  the 
average  energy  released  by  the  event,  and  the  random  variable 
is  this  energy  accumulated  over  four  tenths  of  a  day  (or 
power).  These  data  were  taken  from  1/1/96  to  7/30/96.  The 
comparison  of  the  PDF  for  alpha  =  0.55  with  the  PDF  from 
the  histogram  of  the  seismic  data  is  given  in  Fig.  6.  A  power- 
law  tail  is  expected  from  the  empirical  Gutenburg-Richter 
Law;  however,  the  PaS  PDF  better  models  the  less 
frequently  occurring,  smaller  seismic  events.  Similar  results 
were  obtained  using  crumpling  paper  audio  data  given  at 
http://garak.msc.cornell.edu/~houle/crumpling/ 
crumpling.html.  As  shown  in  Fig.  7,  the  seismic  data  appears 
to  fully  decorrelate  from  one  sample  to  the  next.  Out  to  10 
days  the  auto  correlation  coefficient  appears  to  be  slightly 
positive. 

The  ocean  wave  data  was  taken  from  a  data  base  presented 
by  the  Ocean  Data  Buoy  Center,  a  part  of  the  National 
Oceanic  and  Atmospheric  Administration  (NOAA),  at  http:/ 
/seaboard.ndbc.noaa.gov/historical_data.shtml.  The  web  site 
presents  hourly  estimates  of  the  significant  wave  height 
calculated  from  20  minutes  of  wave  data.  The  data  selected 
was  from  Station  ID  44014  (off  Virginia  Beach,  VA)  from  3/ 
1/96  to  8/2/96.  The  random  variable  is  the  square  of  the 
significant  wave  height  which  is  proportional  to  the  power 
in  the  ocean  waves  (Wave  height  is  assumed  to  be  Gaussian 
over  the  20  minutes.)  The  comparison  of  the  PDF  for  alpha 
=  0.75  with  the  PDF  from  the  histogram  of  the  seismic  data 
is  given  in  Fig.  8.  As  shown  in  Fig.  9,  the  wave  height  data 
appears  to  decorrelate  after  48  hours  with  a  secondary  peak 
at  about  80  hours. 
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Fig.  6.  Comparison  of  PDF  for  alpha=0.55  with  PDF 
estimated  from  histogram  of  seismic  data 


Fig.  7.  Auto  correlation  coefficient  for  seismic  data 


Fig.  8.  Comparison  of  PDF  for  alpha=0.75  with  PDF 
estimated  from  histogram  of  wave  height  data 


References 

[1]  C.L.NikiasandM.  Shao,  Signal  Processing  with  Alpha-Stable 
Distributions  and  Applications,  John  Wiley  and  Sons,  New  York, 
NY,  1995. 

[2]  G.  Samorodnitsky  and  M.  S.  Taqqu,  Stable  Non-Gaussian 
Random  Processes:  Stochastic  Models  with  Infinite  Variance, 
Chapman  and  Hall,  New  York,  NY,  1994. 

[3]  R.  D.  Pierce,  “RCS  Characterization  using  the  alpha-stable 
distribution,”  Proc.  1996  IEEE  National  Radar  Conference,  Arm 
Arbor,  MI,  13-16  May,  1996,  pp.  154-159. 


0  24  48  72  96  120  144  168 

Time  delay  (hours) 

Fig.  9.  Auto  correlation  coefficient  for  wave  height 
data 


Administrative  information 

This  project  was  supported  by  the  Carderock  Division  of 
the  Naval  Surface  Warfare  Center's  In-house  Laboratory 
Independent  Research  Program  sponsored  by  the  Office  of 
Naval  Research  and  administrated  by  the  Research  Director, 
Code  0112  under  Program  Element  0601152N  under 
NSWCCD  Work  Unit  1-7340-506. 


424 


Asymptotic  Distribution  of  the  Hermite  Normality  Test 


David  Declercq  Patrick  Duvaut 

ENSEA-ETIS  6,  avenue  du  PONCEAU  95014,  Cergy-Pontoise,  France 

e-mail  :  declercq@ensea.fr 


Abstract 

This  paper  presents  some  asymptotical  results  of 
the  Hermite  Normality  test  previously  introduced.  We 
show  that  the  Hermite  statistic  Sjj  is  distributed  un¬ 
der  the  null  hypothesis  as  a  quadratic  form  of  normal 
variates  and  under  the  nonnull  hypothesis  as  normal. 
The  special  case  of  tests  with  two  polynomials  is  stud¬ 
ied  in  details.  Finally^  we  give  some  considerations  for 
the  choice  of  a  best  Hermite  test  when  prior  knowl¬ 
edge  is  available  and  especially  we  determine  the  test 
asymptotically  the  most  powerful  for  a  fixed  alternative 
distribution  (the  uniform  distribution).  Those  results 
are  supported  by  simulations. 


1  Introduction 

Testing  normality  is  one  of  the  most  studied  prob¬ 
lem  in  statistics  and  signal  processing  for  the  obvious 
reason  that  the  normal  distribution  is  omnipresent  in 
those  topics  [9]  [5]  [10].  A  major  problem  of  all  the 
tests  that  have  been  introduced  in  the  past  fifty  years 
is  their  matching  to  the  nonnormal  alternative  consid¬ 
ered.  Some  work  best  than  others  for  a  particular  type 
of  alternative  as  their  behaviour  are  reversed  for  other 
types. 

This  particulary  means  that  when  the  nonnull  hypoth¬ 
esis  is  not  known,  one  cannot  claim  that  a  test  is  more 
powerfull  than  another  and  the  problem  of  the  choice 
of  a  test  appears. 

This  remark  led  us  to  emphasize  that  the  Hermite  Nor¬ 
mality  test  [3]  actually  defines  a  wide  range  of  tests,  the 
properties  of  each  depend  on  certain  properties  of  the 
underlying  distribution  ;  therefore,  if  we  have  any  prior 
knowlegde  on  the  distribution  we  have  to  test  (symet- 
rical,  tails  or  conditions  on  higher  cumulants),  there 
exists  a  best  Hermite  test  that  matches  this  distribu¬ 
tion. 

After  recalling  the  form  of  the  Hermite  Normality  test 
in  the  first  section,  we  give  its  asymptotical  behaviour. 


Section  2  will  show  that  it  is  distributed  as  a  quadratic 
form  of  gaussian  variates  under  the  null  hypothesis  and 
in  the  general  case  of  p  polynomials,  and  section  3  gives 
the  asymptotical  distribution  of  the  test  if  only  two 
polynomials  are  considered  both  for  the  null  hypothe¬ 
sis  and  a  nonnull  particular  one  (uniform  distribution), 
supported  by  numerical  simulations.  We  end  up  this 
contribution  with  some  remarks  on  the  choice  of  a  best 
Hermite  test  and  a  few  cautions  if  the  tested  samples 
are  small  sized. 

2  Construction  of  the  test 

The  Hermite  Normality  Test  has  been  introduced  in 
[3]  and  we  briefiy  describe  below  its  generation. 

The  first  step  consists  in  building  a  nonlinear  vector 
which  contains  standardised  Hermite  polynomials  of  a 
random  variate  x. 

Where  all  the  subscripts  ik  are  distincts. 

Because  of  the  properties  of  Hermite  polynomials  [6] , 
when  X  is  Af{0, 1),  this  vector  becomes  spherical :  it  is 
zero  mean  and  its  covariance  matrix  is  identity  ;  prop¬ 
erty  that  vanishes  for  nonnormal  distributions.  There¬ 
fore,  to  test  if  a  random  sample  is  gaussian  or  not,  we 
apply  X  to  the  sphericity  statistic 

where  R  is  the  sample  covariance  matrix  of  X,  |jR|  its 
determinant  and  Tr  (Ji)  its  trace. 

3  Asymptotical  distribution  of  Sh  un¬ 
der  Hq 

Although  our  test  has  a  sphericity  structure,  we  can 
not  apply  the  results  pointed  out  by  Anderson  [1]  be- 
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cause  they  consider  the  sphericity  statistic  with  a  nor¬ 
mal  vector  -  and  the  one  we  consider  (1)  is  not. 

We  therefore  have  to  introduce  a  limit  theorem  derived 
with  the  help  of  two  from  Borovkov  ([2],  pp.  44)  with 
tensorial  notations.  If  A  is  a  matrix  in  EF ,  A*  will 
denote  its  dual  form,  i.e,  a  morphism  of  C[M\R)  and 
(g)  is  the  classical  tensor  product. 

Theorem  1  let  x  be  a  random  variable  with  the  dis¬ 
tribution  Fo  and  the  two  functions 


with  these  notations,  we  have  prooved  the  following 
theorem 

Theorem  2 

Distribution  of  Sjj  ander  the  null  hypothesis 
let  x€:Af{0, 1);  the  Hermite  Normality  statistic  (2)  ver¬ 
ifies 

Sh(x)^  1 

and  has  the  asymptotical  behaviour 


G  :  M  — >  M{pyp)  h  :  M{P)P)  E 

X  G(a;)  T  h{T) 

where  M(p,p)  is  the  space  of  square  real  valued  ma¬ 
trix  of  size  p.  Let  furthermore  h{T)  be  continuous 
in  A  =  f  G(x)dFo(x)  and  the  tensor  of  covariance 
S  =  /(G  -  A)  ®  (G  -  AydFQ{x)  is  finite.  We  have 
then  the  following  result 

Sn{x)  =  j  ^ 


Vn{Sn{x)  -  h{A))  g  h'{Ay  = 

where  h’iT)  is  the  matrix  of  derivatives  of  h{T)  and 
is  a  matrix  whose  components  are  centered 
normal  with  the  tensor  of  covariance  S. 

Ifh'{A)*  (g)  £  “=  0  and  the  tensor  of  the  second  order 
derivatives  hf'{T)  is  finite  in  A,  then 


N{SN{x)-hiA))  g -r 


1  A 

^0  E 


2  .  ,  dtijdtrs 


d^h{^  (3) 

Misr5 


The  proof  is  straigthforward  while  it  is  just  an  exten¬ 
sion  of  the  Borovkov’s  theorems.  We  have  then  to  in¬ 
troduce  some  notations  in  order  to  apply  this  theorem 
to  the  Hermite  test  : 


< 


Sijix)  =  Hiix)Hj{x)  \/(i,j)  e  {1,  ...,p] 
G  =  fai-ifx)!  symetrical  matrix 
A  =  E[G{x)]^[aij] 

=  -A 

is  the  sample  corariance  matrix  of  (1) 

S  =  E  [(G  -  A)  ®  (G  -  Ay]  =  [<;] 
=E[igij{x)  -  E[gijix)]){grs{x)  - 
with  1  <  i,j,r,s<p 


Sh  =  h{R)  = 


(4) 


iV(l-5if(x)))g 

where  ^  =  [^2j]€I  A/*(0,^) 
and  the  distincts  terms  of  S  are 


+E4- 

i>» 

(5) 


For  obvious  reasons  of  space,  we  give  here  only  the 
main  steps  of  the  proof,  which  can  be  found  in  [4]. 

1.  A  =  J  implies  that  Snix)  h(A)  =  1. 

2.  with  the  matrix  of  first  order  derivatives  of  h{G), 
we  found  that  h' (A)  =  0  and  we  need  to  calculate 
the  tensor  of  second  order  derivatives  of  h{G). 

3.  this  tensor  contains  7  types  of  terms 

d'^hjA)  ^  -P  +  1  d'^hjA)  ^  1  a^h(A)  ^  _2 

P  Sgadgjj  p  d^gij 

d^hjA)  _  d^h(A)  _  d^h(A)  _  d^hjA)  ^  p 
dgudgij  dgudgrs  dgijdgir  dgijdgr, 

this  yield  to  the  result  with  the  use  of  theorem  1. 

4.  the  last  step  is  to  derive  the  expressions  of  the 

terms  appearing  in  S  :  {(t||  ,  (jf/,  ,  cr|[ ,  <r[/ ,  crfj  } 

which  can  be  calculated  using  combinatorial  rela¬ 
tions  on  Hermite  polynomials. 

The  Hermite  test  is  asymptotically  distributed  as  a 
quadratic  form  of  centered  gaussian  variates  whose  co- 
variance  coefficients  depend  on  (Tfj  .  In  the  general  case, 
we  cannot  obtain  a  closed  form  for  the  pdf  of  this  kind 
of  distribution.  However,  one  can  find  some  results  on 
the  distribution  of  quadratic  forms  in  gaussian  variates 
in  [8].  One  important  remark  is  that  the  convergence  of 
the  Hermite  test  is  good,  decreasing  as  ^  when  many 
tests  have  a  convergence  in 
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4  Special  case  of  two  polynomials 


An  important  case  is  when  only  two  polynomials 
are  considered  because  it  is  actually  largely  used  in 
practice  and  in  simulations.  The  main  reason  is  that 
Hermite  tests  with  two  polynomials  seem  to  be  more 
robust  when  the  sample  size  is  small  {N  <  50). 

4.1  distribution  under  Tfo 

In  this  section,  we  just  derive  the  quadratic  form 
given  in  theorem  2  for  two  Hermite  polynomials  Hm  (a?) 
and  Hn{x)  and  precise  the  limit  distribution  with  the 
results  in  [8]. 

which  has  the  characteristic  function 


Figure  1.  distribution  of  under 


and  the  limit  distribution  is 


Mz)  =  — -  — — === 

VI  —  —  2izX2 

where  Ai  and  A2  are  the  latent  root  of  the  covariance 
matrix  of 

We  therefore  can  easily  calculate  the  cumulants  of 
^i^der  Tlo  and  Rice  ([ll],p.  99)  proposed 
to  approximate  the  pdf  of  Q  by  a  type  III  Pearson  dis¬ 
tribution  (namely  the  gamma  distribution)  with  same 
mean  and  variance. 


r(r) 


e-St 


with  r 


In  order  to  give  a  numerical  example  of  those  results, 
we  have  considered  the  first  two  Hermite  polynomi¬ 
als  Hx{x)  and  H2{x)  and  compared  the  asymptotical 
distribution  (6)  with  the  histogram  of  100000  samples 

of  iV  ^1  —  (a?)))  for  large  samples  {N  —  10000), 

both  drawn  on  figure  1. 


4.2  distribution  under  Tii 


The  ability  to  obtain  the  limit  distribution  under  a 
particular  alternative  depends  only  on  the  calculations 
of  A  and  ^  (4)  -  and  the  knowlegde  of  the  cumulants  of 
X  makes  it  possible.  We  have  therefore,  using  theorem 
1  the  following  result  under  any  nonnull  hypothesis 

Theorem  3 

Distribution  of  Sff  under  the  nonnull  hypothesis 
Let  a?E/(a?)  a  nonnormal  random  variable. 

Sh{x)^  h{A) 


Vn (ShIx)  -  h{A))^  h'{A)*  (g)  ^ 

£€V(0.^ 


Sh  is  asymptotically  normal  with  a  ^  rate  of  con¬ 
vergence  and  with  mean  h{A))  and  variance  = 
fe'(A)‘(g)S0fe'(A).  ' 

As  an  example,  we  will  give  in  this  section  the  asympn 
totical  distribution  of  5^’”^  under  the  uniform  hy¬ 
pothesis  :  a;€Z/([— Vs,  V3]).  Using  some  combinatorial 
relations  on  the  product  of  2,  3  and  4  Hermite  polyno¬ 
mials  (see  [7]  and  the  references  within),  one  can  derive 
the  results  (7),  (8)  and  (9) 


E[H4x)] 


H»+i(V3)  -  gn+i(-V3) 
2\/3(w  + 1) 


(8) 


E 


Hmjx)  Hnix)'\ 

y/m\  \/riT  J 


fc=0 

amn 


With  those  relations,  we  can  derive  the  expressions  of 
h{A)  and  which  appear  in  the  limit  distribution 

of  As  a  comparison  with  the  numerical  ex¬ 

ample  given  under  Tio,  we  consider  the  Hermite  test 
with  the  first  two  polynomials.  We  obtain  the  mean 
of  Sh  :  =  0.8163  and  its  asymptotical  variance 

=  0.0979.  Figure  2  draws  the  asymptotical  gaus- 
sian  density  ^*(0, 0.0979)  and  the  histogram  of  100000 

samples  of  VN  -  0.8163)  with  N  =  10000. 
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Figure  2.  distribution  of  under  Hi 


5  On  the  choice  of  a  best  test  of  nor¬ 
mality 

Since  we  have  usually  very  few  prior  knowlegde 
about  the  distribution  of  a  sample  on  which  a  decision 
must  be  taken  (in  our  case  -  normal  or  not),  it  is  of  a 
great  interest  to  know  exactly  the  performances  of  the 
Hermite  Normality  test  under  a  fixed  alternative.  In 
this  section  we  will  focus  only  on  this  aspect,  keeping 
in  mind  that  when  nothing  is  known  about  the  signal, 
those  considerations  vanish. 

5.1  asymptotical  study 

As  seen  before,  we  are  able  to  give  the  limit  distribu¬ 
tion  of  the  Hermite  Normality  test  for  any  alternative 
hypothesis  provided  we  know  the  cumulants  of  the  dis¬ 
tribution.  can  thereby  obtain  this  distribution  for 
all  the  tests  with  two  polynomials  and  use  a  classical 
Neymann-Pearson  type  decision  to  find  the  Hermite 
test  with  2  polynomials  asymptotically  the  most  pow¬ 
erful  for  the  uniform  alternative. 

For  that  purpose,  we  have  to  maximize  a  correct  detec¬ 
tion  probability  (CDP')  with  a  fixed  false  alarm  proba¬ 


bility  (FAP).  These  two  probabilities  are  defined  as 

CDP  =  Pro6  {choose  Hi  iTfi  } 

=  ProhiSn  <  -  h(A))E  A/'(0, 

(10) 

FAP  =  Prob  {choose  Hi  \Ho  } 

=  Prob  {Sh  <  Sh)€.  Gamma{p.no .  } 

(11) 

For  reasons  of  calculation,  we  cannot  explore  the  com¬ 
binations  of  polynomials  beyond  the  &th  degree. 

We  have  found  that  the  three  best  tests  are 

«(;■=>  (12) 

This  result  is  of  course  valid  regarding  the  hypothesis 
of  the  approach  mentionned  here  about  which  we  want 
to  give  some  remarks  : 

•  it  is  an  asymptotical  study  :  we  propose  to  verify 
that  5^’^^  is  effectively  the  most  powerful  test  un¬ 
der  the  uniform  hypothesis  for  small  size  samples 
in  the  next  section. 

#  the  maximum  degree  of  our  investigation  is  limited 
to  6.  It  is  not  worrying  because  the  robusteness  of 
the  Hermite  tests  when  the  sample  size  decreases 
is  rather  bad  if  large  degree  polynomials  are  con¬ 
sidered. 

♦  we  consider  only  two  polynomials  i  this  is  still  a 
work  in  progress,  and  if  we  can  follow  the  method 
presented  here  to  give  the  limit  distribution  for 
more  than  two  polynomials,  we  hope  to  give  soon 
more  general  criterions  to  choose  the  set  of  poly¬ 
nomials  that  best  fit  a  particular  alternative. 

5.2  small  sample  results 

It  is  important  to  verify  if  the  asymptotical  be¬ 
haviour  of  a  test  statistic  is  still  the  same  if  applied 
to  small  sized  samples.  In  the  case  of  the  Hermite 
Normality  test,  we  want  to  know  if  the  result  found  on 
the  most  powerfull  test  under  a  particular  alternative  is 
still  true.  For  that  purpose,  we  have  made  power  simu¬ 
lations  based  on  samples  of  size  N  G  {30, 40, 50}  at  5% 
level  of  significance  -  that  means  with  a  0.05  probability 
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of  error  under  the  null  hypothesis.  We  have  considered 
the  three  tests  appointed  asymptotically  as  the  most 
powerful  (12)  and  three  other  arbitrary  choosen.  In 
[3],  we  have  compared  The  Hermite  Normality  test  un¬ 
der  a  lot  of  alternatives  with  three  famous  tests  taken 
in  the  litterature. 

Table  1  gives  the  5%  quantiles  of  the  Hermite  tests  for 
each  N  computed  with  500000  samples  of  Sh  -  more 
complete  tables  are  reported  in  [4]. 


N 

sj.,.) 

C(2,4) 

si;.') 

c(1.2) 

si;.*) 

20 

0.278 

0.128 

0.079 

0.619 

0.427 

0.158 

30 

0.379 

0.156 

0.107 

0.711 

0.492 

0.223 

50 

0.507 

0.203 

10.159 

0.802 

0.559 

0.322 

Table  1.  5%  quantiles  of  six  Hermite  tests 

The  power  of  the  Hermite  test  is  then  the  prob¬ 
ability  (in  percentage)  of  a  sample  distributed  as 
W([-- \/3,  Vs])  to  fall  above  the  corresponding  quantile. 
Those  powers  are  estimated  with  5000  samples  of  size 
N  and  are  given  in  table  2. 


N 

5(1, 2, 3) 

20 

57 

6 

23 

19 

4 

39 

30 

79 

16 

39 

32 

7 

59 

50 

96 

64 

67 

59 

27 

88 

Table  2.  power  of  the  Hermite  test  under  Ui 

Fisrt,  it  is  pointed  out  that  the  asymptotical  study 
gives  good  criterions,  because  the  choosen  tests  are  the 
most  powerful  Hermite  test  (with  two  polynomials), 
even  for  very  small  samples.  Paying  much  more  atten¬ 
tion  on  the  two  tests  and  we  see  that  they 

recover  their  asymptotical  order  (in  term  of  power) 
as  well  as  the  sample  size  grows.  We  finally  remark 
that  the  power  of  grows  rapidely  with  N,  what 

encourages  us  to  find  more  general  criterions  for  the 
choice  of  a  test,  including  those  with  more  than  two 
polynomials. 

We  furthermore  have  to  take  care  with  small  samples  : 
the  simulations  for  iV  =  20  exhibit  powers  too  small  to 
allow  us  to  give  comments  on  them  ;  even  with  sam¬ 
ples  of  size  30,  one  can  easily  deduce  that  the  test  is 
relevant  only  for  and  As  a  matter  of 

fact,  if  we  take  a  look  at  the  4th  column  of  table  2, 
Sjf'  ^  falls  almost  surely  in  the  gaussian  area  of  S/f  : 
for  this  particular  test  statistic  point  of  view,  one  can 


consider  that  the  uniform  distribution  is  very  close  to 
the  normal  one  ! 

6  Conclusion 

We  gave  in  this  paper  the  limit  distribution  of  the 
Hermite  Normality  test  under  the  null  hypothesis  in  the 
general  case  of  p  polynomials  and  in  a  particular  one 
(p  =  2)  under  both  nul  and  nonnull  hypothesis.  The 
fixed  alternative  of  normalised  uniformity  is  studied  - 
and  numerical  simulations  support  those  results.  The 
problem  of  the  choice  of  a  test  statistic  for  a  particular 
alternative  is  discussed  and  an  answer  is  given  for  the 
uniform  distribution.  We  first  find  the  asymptotically 
most  powerful  Hermite  test  and  thereafter  verify  if  it 
is  still  the  most  powerfull  for  small  samples  [N  <  50). 
Cautions  are  made  if  the  sample  are  really  too  small 
and  for  some  test  statistics  ;  some  future  work  will 
first  give  general  choice  criterions  among  the  Hermite 
Normality  test  family  and  second  propose  some  rules 
to  apply  when  nothing  is  known  about  the  underlying 
distribution,  and  this  to  avoid  the  problem  encountered 
at  the  end  of  the  last  section  :  an  almost  surely  bad 
decision  on  the  tested  distribution. 
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Abstract 

In  many  applications  it  is  necessary  to  characterize  the 
statistical  properties  of  the  wavelet/wavelet  packet  coeffi¬ 
cients  of  a  stationary  random  signal  This  problem  is  typi¬ 
cally  encountered  in  denoising  methods  using  wavelet  pack¬ 
ets,  Then,  in  a  stationary  non-Gaussian  noise  scenario,  it 
may  be  useful  to  determine  the  high-order  statistics  of  the 
wavelet  packet  coefficients.  In  this  work,  we  prove  that 
this  task  may  be  performed  through  multidimensional  fil¬ 
ter  banks.  In  particular,  we  show  how  the  cumulants  of 
the  M-band  wavelet  packet  coefficients  of  a  strictly  sta¬ 
tionary  signal  are  derived  from  those  of  the  signal  and  we 
provide  recursive  decomposition  and  reconstruction  formu¬ 
lae  to  compute  the  cumulants  of  these  coefficients.  High- 
order  wavelet  packets,  associated  to  multidimensional  filter 
banks,  are  presented  along  with  some  of  their  properties.  Fi¬ 
nally,  the  asymptotic  normality  of  the  coefficients  is  proved. 


1.  Introduction 

Wavelet/wavelet  packet  decompositions  are  becoming 
popular  in  statistical  applications  such  as  modelling,  detec¬ 
tion  or  estimation  of  observed  noisy  processes  [6, 5].  While 
in  the  classical  additive  model,  the  stationary  noise  is  gen¬ 
erally  assumed  to  be  Gaussian  i.i.d.,  in  some  situations  it 
becomes  however  necessary  to  depart  from  this  hypothesis 
[3,  9].  In  such  cases,  determining  the  high-order  statistics 
of  the  wavelet  packet  coefficients  of  the  process  may  conse¬ 
quently  be  of  interest. 

In  this  paper,  we  present  some  results  concerning  the  sta¬ 
tistical  properties  of  the  M-band  wavelet  packet  coefficients 
of  a  strictly  stationary  random  signal .  In  particular,  we  prove 
that  the  cumulants  of  the  coefficients  may  be  computed  re¬ 
cursively  from  those  of  the  signal,  and  derive  the  decom¬ 
position  and  reconstruction  formulae  for  the  cumulant  field. 
We  show  that  these  operations  may  be  realized  through  mul¬ 
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tidimensional  filter  banks,  and  present  some  properties  of 
the  associated  high-order  wavelet  packets  along  with  the 
connections  with  frame  multiresolution  analysis  [1].  The 
asymptotic  normality  of  the  coefficients  is  finally  obtained 
under  weak  conditions.  We  note  that  the  proofs  of  the  re¬ 
sults  given  in  this  paper  may  be  found  in  [7]. 

2.  Hypotheses 

We  consider  the  general  case  of  an  M-band  orthogonal 
wavelet  packet  decomposition  [12,  11]  corresponding  to  a 
paraunitary  perfect  reconstruction  filter  bank  with  impulse 
responses  {ho{k))kq^,  •  •  •  >  The  associated 

wavelet  packet  functions  will  be  denoted  in  the  sequel  by 
(0 » ^  also  assume  that  the  considered  process 

x{t),t  eM,  IS  strictly  stationary  and  verifies  the  following 
condition: 

37^  e  I  Vn  G  N*  sup  ,  /n)|  < 

where  ,  /n)  denotes  the  polyspectrum  of  or¬ 

der  (n  -1-  1)  of  x{t).  Note  that  this  assumption  implies  in 
particular  that  the  polyspectra  should  be  bounded,  which  is 
related  to  mixing  properties  of  the  process  [2].  The  wavelet 
packet  coefficients  of  the  process  x{t)  at  resolution  2"^  and 
m-th  frequency  bin  are  given  by 

It  can  be  shown  that  this  integral  is  convergent  in  the  mean 
square  sense  when  x{t)  is  stationary  and  Wm  {t)  ^  (®) 

i2(M),  mGN. 

3.  High -order  wavelet  packets 

We  first  introduce  the  notion  of  high-order  wavelet  pack¬ 
ets  and  subsequently  investigate  some  of  their  properties. 
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For  n  G  N*  and  (^i , . .  ♦  ,  ^n)  ^  M”  we  define  the  wavelet 
packets  of  order  {n  +  1)  by: 


(^1 ,  •  •  •  >  - 

/oo 

Wmi  iu)Wm2  (u  +  ti)...  Wm^+i  (u  +  tn)  du, 

■00 


and  point  out  that  these  functions  can  be  viewed  as  exten¬ 
sions  of  the  autocorrelation  wavelet  functions  introduced  by 
Saito  in  [10].  An  example  of  high-order  scaling  function  of 
order  3  is  shown  in  Fig.  1.  Throughout  the  paper,  we  will 
also  use  the  notation: 


Figure  1,  High-order  scaling  function  of  order 
3  based  on  Daubechies  8. 

Recall  that  1^0(0  corresponds  to  the  scaling  function  usu¬ 
ally  denoted  by  ^{t)  and  associated  to  a  multiresolution 
analysis  (MRA)  ((ij)jes,  of  i^(M)  [8].  For  the  sake  of 
clarity,  we  first  recall  the  basic  definitions  concerning  frames 
and  frame  multiresolution  analysis  (FMRA).^ 

Definition  1  Let  %  be  a  complex  separable  Hilbert  space. 
A  sequence  n  C%  is  a  frame  ofH  if  there  exist 
two  constants  A,  5  >  0  such  that 

'^fen,  ^||/|p<X)l(/.i/">P< 511/IP, 

where  A,  B  are  the  frame  bounds,  (•,  •}  denotes  the  inner 
product  on  %,  and  ||/||  =  (/,  /}  i  stands  for  the  norm  of 

^  For  general  discussions  on  these  concepts,  we  refer  the  reader  to  [4, 1  ] . 


f  A  frame  {gn^  n^X}  is  tight  if  A  =  B.  It  is  exact 
W{9n,  n  eZ}  is  a  Riesz  basis. 

Definition!  AFMRA  ((Qj)jgs,7)  of  is  defined  by 
d  sequence  of  nested  closed  linear  subspaces  Q.j  C 
and  an  element  7  G  fio  verifying: 

•  =  L^(R)andf)j^^Qj  =  {0}, 

•  m  G  fij+i  ^  f{2t)  G  fii, 

•  f{t)  eno=>  f{t  -k)e  noforallk  G  z. 

•  {7(t  -  Ar),  A:  G  Z}  is  a  frame  o/fio- 


A  first  interesting  result  is  the  following: 

Proposition  1  Under  some  weak  assumptions,  the 
autocorrelation  function  defines  a  FMRA 

As  an  example  in  the  2-band  case,  the  B-spline  function  of 
order  2  is  obtained  by  considering  the  Haar  scaling  function 
Wo{t)  =  l[o,i](t).  In  the  general  case  (n  >  1),  the  high- 
order  scaling  functions  . . .  ,tn)  do  not  nec- 

essarily  lead  to  a  FMRA  ((J2SJ‘.^.,o))i€Z,7K'V«) 
(M”),  depending  on  the  involved  seeing  function  Wo  {t) . 
For  instance,  we  remark  that  the  high-order  Shannon  scal¬ 
ing  function  and  its  integer  translates  constitute  a  frame  of 
0)  therefore  lead  to  a  FMRA  of  for  all 

n  G  N*,  in  contrast  with  the  scaling  functions  correspond¬ 
ing  to  Meyer  filters. 

However,  the  following  nested  non-orthogonal  subspace 
properties  are  interestingly  obtained: 

Proposition  2  For  all  n  G  N*,  we  have: 


o(n+l)  _  ^(n+1) 

,Mm+pn+i)^ 

Pli‘}Pn+l 


with  (pi . Pn+i)  €  {0, ....  M  -  1}("+1). 

We  can  consequently  show  that  the  high-order  wavelet  pack¬ 
ets  satisfy  multidimensional  two-scale  equations  given  by: 


,IV„ (h  -  ki, . . .  ,tn  -  kn), 


where 
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defines  the  (n  +  1)*'*  order  deterministic  cross-correlation 
sequence  of  the  filter  bank. 

The  high-order  wavelet  packets  also  satisfy  the  following  re¬ 
construction  formulae: 


I  —— 


(tf-'. . 


We  now  present  one  of  our  main  results: 

Proposition  3  The  multidimensional  filter  bank  defined  by 
the  impulse  responses  (^ii  •  ■  •  >  ^n)  of  the  n- 

dimensional  filters  satisfies  the  following  properly: 

Pi)-*-  )Pn+l  il,--*  fin 

//Ipj  ,/lp^^l  V 

M<5(fcl  -  fc'i)  .  .  -  fcj.)  • 

Although  this  property  only  comes  from  the  orthonormal¬ 
ity  of  the  wavelet  packet  bases,  it  is  interesting  to  note  that 
this  result  also  appears  as  a  consequence  of  the  existence 
of  a  dual  frame  •  •  •  >^")  order  (n  -I-  1) 

in  the  case  where  ((flj" .w^o)  ^ 

FMRA. 

Moreover,  we  deduce  that  the  sequence 
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isaffameoffiho^^  0)  with  frame  bounds  AM”  andSM". 
As  a  consequence  of  Proposition  3,  we  however  see 
that  for  all  n  E  N*,  the  multidimensional  sequence 

(ki-Mlu.,.  ,kn-Mln),  (/l,..,,in)e 
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tight  frame  of  with  frame  bounds  A  =:  B 

Recursive  computations  of  the  cumulants  through  the  pre¬ 
viously  exhibited  structure  are  presented  in  the  next  section 
along  with  some  related  properties. 


4.  Cumulant  field  analysis 


We  first  recall  that  the  wavelet  packet  coefficients 
{cj^rn{k))k£Z  coixesponds  to  stationary  and  cross-stationary 
processes  whose  cumulants  are  given  by  the  following  prop¬ 
erty: 


Proposition 4  For  all  n  E  N*  and  (mi, . . .  ,  run+i)  E 
the  crvss-cumulants  of  order  (n  -h  1)  of  the  wavelet 
packet  coefficients  (cj,mi  {k))ke’S,i  •  • »  >  (^j.^n+i  {^))k£^ 
are  obtained  as  the  n-dimensional  inner  products  of  the 
cumulants  of  order  (n  1)  o/  the  analyzed  signal  with  di¬ 

lated/translated  versions  of  the  high-order  wavelet  packets 
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Figure  2.  Multidimensional  analysis  filter 
bank. 


From  the  two-scale  equation  presented  in  the  previous  sec¬ 
tion,  a  scale-recursive  algorithm  involving  the  multidimen¬ 
sional  structure  exhibited  in  the  previous  section  may  be 
proposed  to  efficiently  compute  the  cumulants: 

Proposition  5  The  cumulants  of  the  wavelet  packet  coeffi¬ 
cients  (cj+i,m(*))fc€E  resolution  level  (j  -I-  1)  are  ob¬ 
tained  from  those  of  the  coefficients  (c_j,m(A'))feez  through 
the  multidimensional  filter  bank  involving  filters  with  im¬ 
pulse  responses  (^ii  •  •  •  >  ^n)  decimators 

by  a  factor  M  in  each  of  the  n  dimensions  (see  Fig.  2). 

This  decomposition  property  actually  corresponds  to  a  mul¬ 
tidimensional  multiresolution  analysis  of  the  cumulant  field. 
From  the  results  in  Section  3,  we  easily  find  a  dual  multidi¬ 
mensional  reconstruction  filter  bank  to  the  one  considered  in 
Proposition  5  (see  Fig.  3). 
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Figure  3.  Multidimensional  synthesis  fiiter 
bank. 


It  can  also  be  shown  that  the  cumulants  satisfy  the  following 
quadratic  norm  conservation  property 

S  CUm2[Cj+l,Mm+p:(0. 

>Pn  +  l  /i,...  flft 

<'j+l,Mm+p2(^  +  ^l))  •  •  •  )  Cj+i^Mm+p„^.j (I  +  !»)]  = 

M  cum^[cj,TO(^),--- ,Cj.m(*  +  A:„)]. 
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In  the  next  section,  we  apply  the  previous  properties  to  ob¬ 
tain  an  asymptotic  convergence  result  for  the  wavelet  packet 
coefficients. 

5.  Convergence  in  distribution 

By  using  Proposition  5  in  the  frequency  domain,  we  fi¬ 
nally  show  that  the  statistics  of  order  greater  than  2  of  the 
wavelet  packet  coefficients  exponentially  decay  with  respect 
to  the  resolution  level: 

Proposition  6  The  cumulants  of  order  >  2, of  the 

wavelet  packet  coefficients  at  resolution  level  j  are  upper- 
bounded  as  follows: 

|cum[cj,m(fc),  +  fci), 

where  C  is  independent  of  n  and  [  J  denotes  the  greatest  in¬ 
teger  lower  than  or  equal  to  its  argument. 

The  rate  of  convergence  C  is  related  to  the  frequency  re¬ 
sponses  Hi{f),  2  E  {0, . . .  ,  M  —  1}  of  the  filter  bank  and 
may  be  analytically  derived  for  different  families  of  wavelet 
packets  {e,g,  C  =  for  Walsh-Hadamard  filters).  As 
a  simple  consequence  of  this  result,  we  check  that  the  pro¬ 
cesses  (cj^m{b))kez  converge  in  distribution  to  Gaussian 


processes  when  j  tends  to  infinity.  It  should  be  noted  that,  in 
the  dyadic  wavelet  decomposition  case,  this  latter  fact  was 
also  observed  by  other  authors  [3, 9]  under  different  assump¬ 
tions. 

6.  Conclusion 

In  this  paper,  we  have  presented  some  tools  to  determine 
the  high-order  statistics  of  the  M-band  wavelet  packet  co¬ 
efficients  of  a  strictly  stationary  process.  Muldimensional 
multiresolution  analysis  of  the  cumulant  field  is  performed 
through  an  n-dimensional  filter  bank  whose  properties  have 
been  investigated.  The  asymptotic  normality  of  the  wavelet 
packet  coefficients  has  finally  been  presented. 
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Abstract 

In  this  contribution  we  develop  a  procedure  for  deciding 
whether  a  finite  segment  of  a  signal  can  be  cotisidered 
as  a  realization  of  a  Gaussian  process  or  not.  ^Ne  first 
present  the  theoretical  bases  conducting  to  the  formulation 
of  Bussgang  test.  A  novel  test  based  on  the  sign  non¬ 
linearity  is  introduced  and  its  performance  analysed  in 
comparison  with  two  simple  Gaussianity  tests  presented 
in  the  literature.  The  results  have  shown  a  very  promising 
behaviour  of  the  suggested  method  in  the  presence  of  small 
amount  of  available  data  in  both  the  cases  of  white  and 
correlated  samples. 


I  Introduction 

Deciding  whether  a  finite  segment  of  a  signal  can  be 
considered  as  a  realization  of  a  Gaussian  process  or  not 
is  an  inferential  problem  useful  in  several  applications  of 
digital  signal  processing.  This  decision  is  preliminary  to 
other  activities  of  signal  processing,  to  recognize  the  exis¬ 
tence  of  statistical  information  recoverable  by  higher  order 
statistics,  or  to  detect  the  existence  of  useful  signals  in 
measurements  affected  by  Gaussian  noise. 

Some  classical  approaches  to  this  problem  are  based  on 
frequency  domain  tests,  which  in  general  require  large  sam¬ 
ple  sets.  In  order  to  improve  the  detectability  for  relatively 
short  signal  segments,  recent  contributions  have  been  fo¬ 
cused  on  time  domain  tests.  Basically,  these  techniques 
consists  of  measuring  the  distance  between  the  expected 
values  (calculated  for  the  Gaussian  hypothesis)  and  the 
sample  averages  after  some  non— linear  transformation  of 
the  series.  It  is  known  that  a  quadratic  form  built  on  the 
sample  deviations  from  the  expected  value  is  asymptoti¬ 
cally  chi-square  distributed  under  the  Gaussian  hypothesis. 
This  allows  to  determine  in  principle  the  decision  treshold 
to  apply  to  the  value  returned  from  the  quadratic  form  for 
obtaining  a  wanted  significance  level  (probability  of  Gaus¬ 
sian  hypothesis  rejection  for  true  Gaussian  series). 

The  most  simple  versions  of  these  tests  are  aimed  to 
verify  the  Gaussianity  of  the  marginal  distribution  of  the 
samples.  Of  course,  specific  non-linearities  are  able  to  de¬ 
tect  specific  deviations  from  Gaussianity,  For  instance,  the 


third  power  reveals  unconsistent  skewness  values,  whereas 
the  fourth  power  reveals  anomalies  of  the  kurtosis.  Non- 
Gaussian  distributions  having  skewness  and  kurtosis  close 
to  the  values  pertaining  to  the  Gaussian  case  are  unde¬ 
tectable  by  these  tests. 

Likewise,  the  (complex)  exponential  non-linearities  em¬ 
ployed  in  the  characteristic  function  oriented  tests  [1,  2] 
may  be  unable  to  detect  some  other  specific  non-Gaussian 
behaviors.  In  order  to  deal  with  as  many  cases  as  possi¬ 
ble  in  practical  situations  where  the  nature  of  the  measured 
samples  is  totally  unkown,  composite  tests  based  on  sets 
of  non-linearities  have  been  proposed  and  characterized  in 
the  recent  literature.  The  higher— order  moment  approach 
is  based  on  the  joint  use  of  multiple  moments  [3],  whereas 
the  empirical  characteristic  function  approach  employs  dif¬ 
ferent  values  of  the  parameter  in  the  exponential.  In  some 
cases,  such  as  linear  filtered  signals,  the  non-Gaussian  na¬ 
ture  is  difficult  to  detect  from  the  analysis  of  the  marginal 
distribution.  For  this  reason,  multivariate  detectors  have 
been  proposed.  A  unified  theory  of  time  domain  Gaussian¬ 
ity  tests  based  on  finite  memory  non-linearities  has  been 
very  recently  exposed  in  [4].  It  is  based  on  Price  theo¬ 
rem,  which  relates  the  moments  of  Gaussian  n-variates  to 
the  moments  of  given  functions  of  these  variates.  This 
approach  enligths  how  tests  might  be  designed  in  general. 
Specifically,  higher  order  moment  based  tests  and  empir¬ 
ical  characteristic  function  based  tests  are  derived  for  the 
n-variate  case.  In  this  contribution,  we  propose  an  alterna¬ 
tive  approach  based  on  Bussgang  property. 

II  Bussgang-based  Method 

As  well  known,  Bussgang  theorem  states  that  the  cross- 
correlation  function  of  a  Gaussian  stationary  process  and 
of  its  version  passed  trough  a  zero-memory  non-linearity  is 
proportional  to  the  auto-correlation  function  of  the  process, 
namely 

E{x[n-\-k]-g  {x[n])}  =  kg-E  {x[n  +  k]  ■  x[n\} 

The  proportionality  factor  kg  depends  on  the  non-linearity 
g{-)  and  it  can  be  expressed  as  [5] 
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This  property  has  been  extended  to  complex  process  in 
[6,  7]  and  generalized  to  the  multivariate  case  in  [8]. 

Even  though  Bussgang  theorem  is  implied  by  Price  theo¬ 
rem,  it  presents  peculiar  aspects  which  makes  it  particularly 
attractive  for  various  applications  of  signal  processing.  For 
instance,  it  constitutes  the  principle  underlying  some  well 
known  techniques  of  blind  deconvolution  employed  in  data 
communication  systems  and  in  geophysics. 

The  Gaussianity  test  based  on  Bussgang  theorem  simply 
consists  of  measuring  the  deviation  from  proportionality  of 
the  sample  auto-correlation  to  the  sample  cross-correlation 
for  a  given  non-linearity,  with  the  theoretical  proportion¬ 
ality  factor.  In  particular,  we  first  estimate,  according  to 
the  mean  square  error  criterion,  the  proportionality  factor 
kg  from  the  overdetermined  set  of  equations: 

Rxg{k^  —  kg  •  Rxxik^  (3) 

with  k  =  0..iV,  where  the  auto-correlation  Rxx{k)  and 
the  cross-correlation  Rxg{k)  functions  are  estimated  for 
a  given  number  iV  +  1  of  lags  from  the  available  set  of 
data.  In  the  subsequent  step,  we  wonder  whether  the  esti¬ 
mated  proportionality  factor  is  significantly  similar  to  the 
theoretical  factor  k^  derived  under  the  Gaussian  assump¬ 
tion.  This  is  accomplished  by  observing  the  sample  dis¬ 
tribution  of  the  weighed  (by  the  sample  variance  i?a:a:(0)) 
non-Gaussianity  error,  i.e.  : 

W  [i?.x(0)]  •  {kg  -  kf) 

This  makes  the  design  of  a  test  simple  and  meningfiil;  in 
fact,  one  single  nonlinerity  must  be  chosen.  On  the  other 
hand,  the  test  is  multivariate  by  nature.  Its  dimensional¬ 
ity  depends  on  how  many  correlation  lags  are  included  in 
the  test.  It  is  easily  verified  that  this  approach  generates 
well  defined  higher  order  moment-based  tests.  Moreover, 
it  allows  to  extend  the  empirical  function  based  test  to  the 
multivariate  case  without  need  of  introducing  multiple  pa¬ 
rameters,  as  in  previous  approaches. 

Ill  Some  Application  Examples 

In  this  contribution,  we  have  first  presented  the  theoret¬ 
ical  bases  conducting  to  the  formulation  of  Bussgang  test. 
Then,  our  attention  should  be  focused  on  the  choice  of  the 
non-linearity.  In  this  respect,  it  should  be  noted  that  no  ra¬ 
tional  criteria  are  currently  available  about  the  choice  of  a 
specific  test  for  the  application  at  hand,  apart  obvious  con¬ 
siderations  such  that  no  odd  powers  should  be  employed 
for  symmetrical  distributions.  Often,  more  knowledge  is  a 
priori  available  about  possible  deviations  from  Gaussian¬ 
ity.  For  instance,  this  is  evident  for  the  case  where  one 
is  interested  to  detect  the  presence  of  signals  in  noise,  or 
to  reveal  if  Gaussian  sources  are  affected  by  some  kind 


of  distortions.  In  those  cases,  we  would  require  to  choice 
among  different  tests  the  one  giving  the  best  performance 
at  the  same  computational  cost  or  for  the  same  sample  size. 

Thus,  in  order  to  gain  insigth  into  the  test  characteristics, 
we  compare  the  performance  of  different  tests  versus  a  class 
of  input  distributions  indexed  by  continuous  parameters. 
For  simplicity,  we  first  refer  to  the  univariate  case  and  to 
a  generalized  Gaussian  distribution,  defined  by: 


With  this  model,  we  can  trace  the  behavior  of  different 
tests  for  even  small  deviations  from  Gaussianity,  thus  char¬ 
acterizing  their  sensitivity  with  respect  to  the  distribution 
parameter. 

A  further  example  of  application  has  been  numerically 
performed  by  considering  an  additive  mixture  of  non- 
Gaussian  binary  signal  and  independent  Gaussian  noise 
realizations,  such  as  it  happens  in  a  number  of  data  com¬ 
munication  links,  Le. 

Px{x)  =  ^{6{m,a‘^) +g{-m,a^))  (5) 

where  g(m,  a^)  is  the  normal  distribution  of  mean  m  and 
variance  .  The  test  has  been  carried  out  for  several 
values  of  Signal-to-Noise  ratios  (SNR  =  m^/cr^)  in  order 
to  assess  the  validity  of  the  Gaussian  test  even  under  low- 
level  signal  conditions. 

IV  Numerical  Results 

In  particular,  we  have  theoretically  evaluated  the  signif¬ 
icance  level  (i.e.  the  probability  of  rejecting  the  Gaussian 
hyphotesis  when  the  signal  is  actually  Gaussian)  and  the 
power  of  the  various  tests  (i.e.  the  probability  of  reject¬ 
ing  the  Gaussian  hyphotesis  when  the  signal  is  not  actually 
Gaussian). 

We  have  carried  computer  simulations  to  assess  the  an¬ 
alytical  results.  For  sake  of  simplicity,  we  have  considered 
univariate  error  function  and  the  sample  variance  as  the 
error  weighing  function,  i.e. 

^lRxx(0)J  =  Rxx(0)  (6) 

in  order  to  meet  the  reference  case  widely  encountered  in 
the  literature.  We  have  considered  both  white  and  first 
order  AR-filtered  (p  =  0.5)  signals  of  several  lengths  of 
samples;  we  have  then  collected  the  percentage  of  the  re¬ 
jection  of  the  Gaussian  hyphotesis,  by  fixing  all  the  em¬ 
ployed  thresholds  in  order  to  use  a  significance  level  of  5% 
for  each  test.  Different  generalised  Gaussian  distributions 
have  been  considered  by  varying  the  parameter  a,  which 
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characterizes  both  the  super-  and  sub-Gaussian  behavior 
of  the  (symmetrical)  distribution.  Some  of  the  obtained 
results  are  shown  in  Figs.l  and  2  for  normalized  (unitary 
variance)  distributions,  i.e. 


The  results  show  that  for  sub-Gaussian  distributions 
(a  >2)  Koutruvelis-Epps  test  outperforms  Giannakis- 
Tsatsanis  test,  while  their  merits  are  comparable  for  super- 
Gaussian  distributions  (o<2).  This  provides  a  limited,  but 
significant  choice  criterion  in  many  applications.  Search¬ 
ing  for  a  different  test  inspired  to  Bussgang  paradigm,  we 
have  then  considered  a  simple  test  based  on  a  nonlinear¬ 
ity  constituted  by  the  signum  function.  Interesting  enough, 
the  Bussgang— signum  test  exhibits  the  best  performance  for 
sub-Gaussian  distributions.  This  is  particularly  significant 
in  the  presence  of  low-pass  filtered  non-Gaussian  signals 
and  for  small  sample  size. 

Such  a  result  is  not  surprising,  because  it  indirectly  re¬ 
lates  to  known  circumstances.  In  fact,  the  signum  distortion 
is  successfully  employed  for  phase  retrieval  in  blind  decon¬ 
volution  of  communication  channels  where  data  exhibit  a 
strong  sub-Gaussian  behavior;  conversely,  the  power  non¬ 
linearity  is  employed  for  blind  deconvolution  of  seismic 
signals,  which  are  super-Gaussian  [9]. 

The  results  of  the  simulations  performed  for  the  case 
of  a  binary  signals  embedded  in  uncorrelated  Gaussian 
noise  (see  the  Figs.3  and  4)  have  confirmed  the  promis¬ 
ing  behaviour  of  the  signum  non-linearity.  Once  again, 
the  signum-based  test  yields  the  best  performance  for 
both  white  and  correlated  data.  Conversely,  Giannakis- 
Tsatsanis  test  appears  not  suitable  for  such  purposes,  prob¬ 
ably  because  the  signum-based  test  statistics  result  in  better 
assorted  histograms  w.r.t.  higher-order  statistics,  which  are 
affected  by  side  events. 

V  Conclusion 

A  procedure  for  deciding  whether  a  finite  segment  of 
a  signal  can  be  considered  as  a  realization  of  a  Gaussian 
process  or  not  has  been  developed.  The  theoretical  bases 
conducting  to  the  formulation  of  Bussgang  test  have  been 
presented.  A  novel  test  based  on  the  sign  non-linearity 
has  been  introduced  and  its  performance  analysed  in  com¬ 
parison  with  two  simple  Gaussianity  tests  presented  in  the 
literature. 

The  results  have  shown  a  very  promising  behaviour  of 
the  suggested  method  in  the  presence  of  small  amounts  of 
both  white  and  correlated  available  data.  Future  investiga¬ 
tions  will  be  performed  to  confirm  such  trend  in  the  case 
of  multiple  lags. 
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%  of  rejection  ^  %  of  rejection 


Figure  1:  Percentage  of  rejection  of  the  Gaussian 
hyphotesis  vs.  the  non-Gaussianity  parameter  a  of 
the  generalised  Gaussian  distribution  for  128  white 
samples. 


Figure  2:  Percentage  of  rejection  of  the  Gaussian  hy¬ 
photesis  vs.  the  non-Gaussianity  parameter  a  of  the 
generalised  Gaussian  distribution  for  256  correlated 
p  =  0.5  samples. 


SNR  (dB) 


Figure  3:  Percentage  of  rejection  of  the  Gaussian 
hyphotesis  vs.  the  SNR  of  the  additive  mixture  of 
non-Gaussian  binary  signal  and  inpedendent  Gaus¬ 
sian  noise  for  128  white  samples. 


SNR  (dB) 


Figure  4:  Percentage  of  rejection  of  the  Gaussian 
hyphotesis  vs.  the  SNR  of  the  additive  mixture  of 
non-Gaussian  binary  signal  and  inpedendent  Gaus¬ 
sian  noise  for  256  correlated  p  =  0.5  samples. 
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Abstract 

A  modification  to  a  previously  developed  characteristic 
function  based  Gaussianity  test  is  proposed.  The  power  of 
the  test  is  consequently  improved.  This  test  is  then  extended 
to  the  multivariate  case,  allowing  it  to  be  applied  to  cor¬ 
related  data.  Monte  Carlo  simulations  are  performed  to 
compare  power  with  two  other  tests  for  multivariate  Gaus¬ 
sianity,  with  encouraging  results. 


1.  Introduction 

The  assumption  of  Gaussianity  is  important  to  verify  in 
many  areas.  Tests  for  Gaussianity,  using  the  characteristic 
function  (cf),  have  been  developed  with  promising  results 
[3,  5,  8].  However,  these  tests  perform  well  (achieve  high 
power)  for  independent  and  identically  distributed  (iid)  data 
only.  This  limits  their  applicability  to  real  situations  where 
the  data  is  often  correlated.  It  is  therefore  necessary  to 
address  the  issue  of  testing  correlated  data  for  Gaussianity 
using  the  cf.  This  paper  formulates  tests  for  Gaussianity  of 
correlated  data  and  shall  be  refer  to  them  as  Multivariate 
Gaussianity  Tests. 

Other  tests  for  multivariate  Gaussianity  have  generally 
lagged  similar  univariate  tests  [6]  due  to  an  increase  in  com¬ 
plexity  and  have  required  large  numbers  of  samples  (in  the 
thousands)  [2].  In  this  paper  we  show  that  the  Gaussianity 
tests  based  on  the  cf  are  readily  extendable  to  the  multivari¬ 
ate  case  and  achieve  high  power,  while  not  being  limited  to 
large  samples. 

2.  Characteristic  function  based  tests  for  Gaus¬ 

sianity 

It  has  already  been  shown  [3,  4]  that  the  cf  can  be  used 
as  the  basis  of  Gaussianity  testing.  In  [8],  problems  related 


to  the  use  of  the  empirical  characteristic  function  (ecf),  for 
such  tests  was  highlighted.  The  ecf  is  an  unbiased  esti¬ 
mator  of  the  cf  of  a  process  based  on  a  finite  number  of 
samples,  N,  however  it  has  unacceptably  high  variance,  see 
Figure  1 .  In  fact,  its  variance  approaches  a  constant  {1/2N) 
as  t  approaches  infinity  [1]. 


Figure  1.  Magnitudes  of  the  ecfs  of  10  reali¬ 
sations  of  a  standard  Gaussian  random  vari¬ 
able,  with  the  true  Gaussian  cf. 


Instead,  it  was  proposed  that  the  kernel  characteristic 
function  estimator  (KCFE)  be  used  instead  of  the  ecf.  The 
development  of  the  KCFE  was  motivated  by  analogy  with 
the  theory  of  kernel  density  estimation  [7],  Briefly,  the 
KCFE  is  produced  by  multiplying  an  initial  estimate  of  the 
characteristic  function  of  a  process,  specifically  the  ecf,  by  a 
kernel.  The  kernel  used  in  [8]  was  Gaussian.  This  smoothes 
out  variations  in  the  large  t  region  of  the  estimate,  see  Fig¬ 
ure  2,  at  the  expense  of  the  introduction  of  a  bias. 
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its  variance  will  approach  zero  as  t  increases  without  intro¬ 
ducing  a  bias,  see  Figure  3. 


Figure  2.  The  effect  of  multiplicative  smooth¬ 
ing  on  the  ecf  of  1  reaiisation  of  a  Gaussian 
process. 


3.  A  new  approach  to  overcoming  non¬ 
vanishing  variance  of  the  estimator 


Figure  3.  The  effect  of  multiplicative  smooth¬ 
ing  on  the  difference  between  the  ecf  and  the 
cf  under  Ho  for  1  reaiisation  of  a  Gaussian 
process. 


In  this  paper,  we  consider  another  method  of  overcoming 
the  problem  of  the  non-vanishing  variance  of  the  ecf.  This 
approach  does  not  attempt  to  find  a  better  estimate  of  the  cf, 
rather,  it  recognises  that  a  better  measure  of  the  difference 
between  the  data  and  the  distribution  under  the  null  hypoth¬ 
esis  is  required.  We  consider  smoothing  this  difference,  and 
not  smoothing  the  estimate  as  is  done  for  the  KCFE  method. 

Let  X  =  [Xo, . . . ,  Xjv-i]  be  a  vector  of  N  observations. 
Consider  a  function  ix  (0  =  (^)  (f )  where 

n=0 

is  the  ecf  and  (f)  is  the  cf  under  Hq.  From  [1], 

E[ex(f)]  =  0 

var[5Rex(f)]  =  ^  (  1  + 

var  [3ex  {*)]  =  ^  ^  ~  i‘2t)j 

Note  that  the  variance  of  ex  (t)  is  the  same  as  that  of  the 
ecf. 

If  this  function  is  now  subjected  to  a  multiplicative 
smoothing  operation  by  (px{t),  a  cf  domain  kernel  func¬ 
tion  (ckf)  similar  to  that  used  in  the  KCFE, 

ix  (t;  <px)  =  ix  i*)fx  {t) 


Figure  4  shows  that  the  smoothed  difference  function  has 
lower  mean  square  errors  (MSE)  than  both  the  ecf  (due  to 
reduction  in  variance)  and  the  KCFE  (due  to  the  removal  of 
bias). 


Figure  4.  The  Mean  Square  Errors  of  the 
ecf,  KCFE  and  smoothed  difference  functions 
from  25  realisations  of  a  Gaussian  process. 


The  results  of  500  Monte  Carlo  simulations,  compar- 
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ing  the  performance  of  the  KCFE  and  smoothed  difference 
methods  are  shown  in  Table  1.  They  show  that  improve¬ 
ments  have  been  made  for  most  distributions  tested,  with 
the  exception  of  the  Uniform(0,l)  and  /3(4, 4)  cases. 

Possible  reasons  for  this  are  that  these  distributions  are 
the  only  ones  tested  whose  cfs  are  actually  less  than  the 
Gaussian  cf  at  any  value  of  t.  Consequently,  the  difference 
between  their  cfs  and  the  Gaussian  cf  oscillates  above  and 
below  zero,  as  opposed  to  the  other  distributions  which  were 
generally  above  the  Gaussian  cf.  This  makes  it  more  difficult 
for  the  test  statistics  used  in  this  paper  to  distinguish  it  from 
the  Gaussian  case. 


Distribution^ 

KCFE 

Sm.  diff. 

1/(0, 1) 

95.0 

82.4 

xi 

99.6 

99.8 

xi 

59.6 

66.8 

L 

29.2 

65.0 

if(l,l) 

85.0 

90.0 

LN{0, 1) 

100.0 

100.0 

14.0 

10.2 

Average 

68.9 

73.5 

Table  1.  Power  of  the  KCFE  and  smoothed 
difference  cf  based  tests. 


4.  The  multivariate  case 

Consider  a  vector  valued  random  variable 
X  =  [Xi,X2,...,XMf 

where  Xm  =  is  the  vec¬ 

tor  of  N  observations  on  the  mth  variable  for  m  = 
1, 2, ....  M.  We  assume  correlation  exists  between  the  vari¬ 
ables  Xi ,  X2 , Xm ■  The  multivariate  ecf  may  be  given 
by 

1 

<2,  •  •  •  )  tu)  =  !Cm=l 

n=0 

The  smoothed  difference  between  the  multidimensional 
ecf  and  the  cf  under  Ho  is 

ex  (f  1 )  ^2)  •  •  • )  (ti ,  *2 )  •  •  • )  tAf ) 

iSome  of  the  non-Gaussian  distributions  considered  are  U:  Uniform, 
L:  Laplace,  K:  K-distribution,  LN:  Log-Normal 


where 

— 0x(fl)t2)  •  •  •  i^m) 

and  ¥)x  (ti ,  *2 ,  ■  •  • ,  tM)  is  a  multidimensional  ckf. 

4.1.  Test  statistic 

Finding  a  meaningful  single  measure  of  the  difference 
between  the  estimated  cf  and  cf  under  Ho  is  very  important 
to  the  performance  of  this  test.  The  cf  of  every  distribution 
is  unique,  and  therefore  the  difference  between  it  and  the 
Gaussian  cf  will  also  be  unique.  The  chosen  statistic,  along 
with  the  chosen  set  of  t  values  at  which  to  evaluate  the 
statistic,  must  take  into  account  a  wide  range  of  these  distinct 
cfs  in  order  to  produce  an  omnibus  Gaussianity  test  -  that  is, 
a  test  that  is  powerful  against  all  non-Gaussian  alternatives. 

For  the  purposes  of  this  paper,  a  number  of  different  test 
statistics  were  investigated;  such  as  using  the  maximum  and 
mean  operators  on  the  difference  between  the  magnitude  of 
the  ecf  and  the  Gaussian  cf,  the  absolute  difference  and  the 
squares  of  the  difference.  It  was  found  that  the  absolute  value 
of  the  mean  of  the  difference  provided  the  most  powerful  test 
for  the  conditions  used.  This  statistic,  which  we  shall  denote 
by  Qx,  is  compared  to  the  empirically  derived  threshold  to 
determine  the  acceptance  or  rejection  of  Ho- 

Optimisation  of  the  choice  of  Qx  is  still  possible. 

4.2.  Choice  of  t  values 

As  mentioned  in  subsection  4.1,  the  choice  of  values  of 
t  at  which  to  evaluate  the  test  statistic  can  be  crucial  to  the 
power  of  the  test  against  a  particular  alternative.  The  cfs  of 
some  distributions  are  very  similar  the  Gaussian  cf  at  some 
values  of  t,  while  vastly  different  at  others.  The  values  used 
must  be  able  to  provide  good  results  against  a  large  number 
of  alternatives. 

If  increased  sensitivity  against  a  particular  distribution 
were  required,  it  would  be  possible  to  determine  at  what 
points  the  cfs  under  Ho  and  Hi  differ  the  most  and  evaluate 
them  only  at  these  points.  Due  to  the  variance  associated 
with  the  estimator,  this  maximises  the  power  of  the  test. 
Evaluation  at  a  small  number  of  points  also  makes  the  test 
easier  to  implement  and  quicker  to  run,  especially  in  the 
higher-dimensional  spaces  used  in  the  multivariate  case. 

Finding  optimal  t  values  is  beyond  the  scope  of  this  paper 
and  will  be  reported  elsewhere. 

4.3.  Characteristic  function  domain  kernel  function 

The  choice  of  ckf  when  using  the  KCFE  method,  was 
investigated  in  [8].  For  a  standardised  Gaussian  process,  a 
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Gaussian  kernel  was  proposed,  which  for  the  multivariate 
case  is, 

)  ^2}  •  •  • }  ^m)  “  exp 

where  is  the  kernel  function  (inverse)  width.  The 
optimum  ajes  was  found  to  be  approximated  by  = 
0.977^“®-^,  where  N  is  the  number  of  data  points.  Be¬ 
cause  of  the  similarity  between  the  KCFE  and  the  smoothed 
difference  method  used  in  this  paper,  we  will  use  this  kernel. 


Distribution'^ 

Qx 

W 

S'ir 

X^(2,2,l) 

96.2 

99.7 

95.3 

x1(2,2,2) 

90.2 

97.3 

88.6 

Xi(2,2,3) 

81.7 

91.2 

83.0 

^3 

90.7 

75.5 

86.3 

Cauchy 

100 

100 

100 

0.8iV2(0,S)  +  0.2iV2(0,9E) 

92.5 

78.1 

93.6 

0.97V2(0,E)+0.1iV2(0,16S) 

89.2 

85.8 

93.1 

Log  Normal 

99.8 

100 

99.4 

Average 

92.5 

91.0 

92.4 

5.  Results  and  discussion 

The  power  of  the  test  for  the  bivariate  case  is  investigated 
through  Monte  Carlo  simulations.  The  results  are  presented 
in  Tables  2,  for  iV  =  20  data  points,  and  3,  when  50  data 
points  are  used.  The  power  of  the  proposed  test  is  compared 
to  bivariate  tests  based  on  an  adaptation  of  the  Shapiro  Wilk 
W  test  and  Mardia  and  Foster’s  5^  test  [6]. 


Distribution^ 

Qx 

W 

5^ 

Xi(2,2,l) 

63.3 

75.6 

49.4 

Xi(2,2,2) 

51.8 

59.2 

42.4 

Xi(2,2,3) 

43.1 

48.1 

35.4 

h 

54.8 

47.6 

48.6 

Cauchy 

97.6 

96.9 

94.9 

0.8iV2(0,S)  +  0.2iV2(0,9E) 

60.9 

44.3 

57.0 

0.9Ar2(0,E)  +  0.1Ar2(0,16S) 

59.5 

52.3 

61.9 

Log  Normal 

89.0 

87.6 

70.2 

Average 

65.0 

64.0 

57.5 

Table  2.  Percentage  of  realisations  rejected  at 
a  5%  ievel  of  significance  for  N  =  20. 


For  this  investigation,  5000  replications  of  each  test  were 
performed.  The  level  of  significance  was  set  at  5%  and 
thresholds  were  derived  empirically  using  bivariate  Gaus¬ 
sian  data  with  marginal  means  of  zero  and  covariance  matrix 
\  ^  0.5  ■ 

^  [  0.5  1  J  • 

All  data  was  standardised  by  marginal  means  and 
variances,  then  the  correlation  between  the  two  vari¬ 
ables,  M,  was  estimated^  The  bivariate  Gaussian  cf, 
with  correlation  matrix  M,  was  then  evaluated  at  t  = 
[—2.5,  —1.5,  -0.5, 0.5, 1.5, 2.5].  The  difference  between 
this  cf  and  the  magnitude  of  the  ecf  was  then  smoothed  by 

^The  bivariate  chi-square  distributions,  ,  i/2 ;  are  the  joint  dis¬ 

tributions  of  Wi  and  W2  where  Wi  =  Vi V.  Each  is  an  independent 
variate  with  i/i  degrees  of  freedom  and  V  is  a  variate  with  1/  degrees 

of  freedom. 


Table  3.  Percentage  of  realisations  rejected  at 
a  5%  level  of  significance  for  N  -  50. 


the  Gaussian  ckf  described  in  section  4.3.  Taking  the  mag¬ 
nitude  of  the  mean  of  this  function  produced  the  test  statistic 
Qx  which  was  compared  to  the  empirically  derived  thresh¬ 
old. 

The  results  in  Tables  2  and  3  show  an  increase  in  average 
power  by  the  smoothed  difference  cf  based  test,  Qx » against 
both  the  W  and  5^  tests.  Each  test  has  some  distributions 
against  which  it  is  more  powerful,  however,  none  of  the  tests 
performs  best  for  all  distributions. 

An  important  feature  of  the  performance  of  the  Qx  test 
is  that  of  the  results  shown,  only  once  was  it  the  worst 
performing  of  the  three  tests  -  this  was  for  the  X2(2,2,3) 
distribution  when  N  =  50.  Both  the  W  and  the  5^  tests  had 
their  relative  strengths  and  weaknesses,  however,  the  Qx 
test  performed  consistently  well.  For  example,  the  W  test 
was  the  best  performing  for  all  the  bivariate  distributions, 
while  the  test  generally  achieved  higher  rejection  rates 
for  the  other  distributions. 

The  Average  rejection  rate  should  be  viewed  with  caution, 
as  it  really  only  has  meaning  if  all  8  distributions,  and  only 
these  distributions,  are  equally  likely  to  occur. 

The  difference  in  relative  performance  of  the  Qx  test  did 
not  appear  to  be  affected  by  a  change  in  the  number  of  data 
points  used,  N. 

Several  other  features  of  cf  based  tests  make  them  ap¬ 
pealing  compared  to  the  other  tests  available  and  mentioned 
in  this  paper.  Cf  based  tests  can  easily  be  adapted  to  test  for 
any  known,  fixed  distribution,  and  their  sensitivity  to  cer¬ 
tain  alternatives  can  be  adjusted  through  the  choice  of  test 
statistic  and  t  values. 

6.  Conclusions 

We  have  proposed  an  improvement  to  the  KCFE  based 
Gaussianity  test  by  smoothing  the  difference  between  the  ecf 
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and  cf  under  Ho-  Additionally,  this  test  has  been  extended 
to  the  multivariate  case,  allowing  it  to  be  used  on  correlated 
data.  An  empirical  power  study  and  comparison  to  other 
tests  revealed  encouraging  results. 

Further  studies  into  the  performance  of  this  test  are  cer¬ 
tainly  possible,  as  well  as  detailed  investigation  into  optimi¬ 
sation  of  some  aspects  of  the  test,  namely,  the  choice  of  a 
test  statistic  and  the  range  of  t  values  at  which  to  evaluate 
the  cfs. 
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Abstract 

Both  the  fractional  Brownian  motion  (fBm)  and  the  Auto¬ 
regressive  Integrated  Moving  Average  (ARM A)  models  have 
been  applied  to  teletraffic  scenarios  in  recent  years.  These 
models  became  popular  after  the  discovery  that  Ethernet 
and  VBR  video  data  appear  to  possess  the  property  of  self - 
similarity.  However  the  results  presented  in  this  paper  sug¬ 
gest  that  Ethernet  data  is  more  impulsive  than  traffic  gener¬ 
ated  by  these  models. 


1  Introduction 

Teletraffic  modelling  plays  an  important  p^  m  network 
design  and  resource  allocation.  It  allows  engineers  to  pre¬ 
dict  the  network  behaviour  before  physical  construction  and 
permits  experimentation  without  endangering  real  systems. 
However  for  modelling  to  be  effective  the  system  must  be 
simulated  as  accurately  as  possible.  For  this  reason  the 
time-series  analysis  and  modelling  of  teletraffic  scenarios 
has  become  a  significant  area  of  research. 

One  of  the  most  important  discoveries  in  this  area  in  re¬ 
cent  years  was  the  result  of  work  carried  out  at  Bellcore 
Labs  on  Ethernet  traces  [5].  They  discovered  that  such 
traces  were  similar  in  distribution  across  a  wide  range  of 
time  scales  (from  milliseconds  to  hundreds  of  seconds). 
This  phenomena  was  due  to  the  traces  being  self-similar 
in  nature,  which  implied  that  they  also  exhibited  long  range 
dependency  (LRD). 

The  importance  of  this  discovery  becomes  apparent  when 
it  is  observed  that  Poisson,  ARMA  and  Markov  processes 
are  unable  to  exhibit  LRD.  In  fact  they  are  short  range  de¬ 
pendent  (SRD)  processes,  but  until  this  discovery  almost  all 
teletraffic  modelling  research  had  been  based  upon  them. 
Obviously  new  models  were  required  and  the  most  popu¬ 
lar  to-date,  those  based  on  fBm  and  ARIMA  processes,  are 
introduced  in  the  next  section. 

This  paper  concentrates  on  examining  the  original 
Bellcore  Ethernet  data  and  attempting  to  determine  whether 
the  new  models  (which  generate  traffic  with  Gaussian  innov¬ 


ations)  are  optimal.  The  evidence  is  presented  in  increasing 
order  of  quantitative  strength  and  suggests  that  a  more  im¬ 
pulsive  innovation  distribution  than  the  Gaussian  may  be 
required. 


2  Self-similar  processes  and  models 


If  a;(f)  is  arandom  process  then  it  is  defined  as  self-similar 
iff  . 

a;(af)Vx(f),  a>0,  He  (0, 1].  (D 

H  is  the  Hurst  exponent  and  is  a  measure  of  self-similarity, 
it  lies  in  the  range  0.5  <  H  <  \  for  a  self-similar  process, 

and  =  denotes  equal  in  distribution. 

Self-similar  time-series  have  some  interesting  properties: 

•  They  possess  a  hyperbolically  decaying  auto¬ 
correlation  function  of  the  form. 


r{k)  ~  ask-*QO,  (2) 


where  L{t)  is  a  slowly  varying  function  at  infinity'. 
Therefore  the  autocorrelation  function  is  unsum- 


mable,  i.e. 


^r(fc)  =  oo. 


(3) 


k 

This  infinite  sum  is  the  definition  for  long  range  de¬ 
pendency  so  all  self-similar  signals  are  long-range 
dependent. 


•  The  sample  variance  decays  more  slowly  than  the 
number  of  points  in  the  sample,  m. 


Var(A:^'”^)  oc  U) 


This  is  why  the  sample  statistics  such  as  mean  and 
variance  are  slow  to  converge. 


•  The  power  spectrum  obeys  a  1//  type  law  close  to 
the  origin, 

/(A)  oc  as  A 0.  (5) 

'i.e.  limt— oo  =  *  for  all  a:  >  0 


0-8186-8005-9/97  $10.00  ©  1997  IEEE 


444 


This  is  why  self-similar  and  long  range  dependent 
processes  are  sometimes  termed  1/f-noise. 

Obviously  these  properties  cannot  be  reproduced  by  SRD 
processes.  So  processes  that  are  capable  of  producing  time- 
series  with  these  properties  were  required.  Two  processes 
that  were  already  well  known  in  other  fields,  but  were  ap¬ 
plicable  to  self-similar  teletraffic  modelling,  were  fBm  [6] 
and  ARIMA  [3]  processes. 

fBm  is  a  non-stationary  process,  Bnit),  with  stationary 
increments  (these  increments  are  often  termed  fractional 
Gaussian  noise  (fGn))  and  is  defined  as 


Bnit) 


c[f  {(<-«)"-' 


+  f  (6) 

where  B(.)  is  standard  Brownian  motion  and  G  is  a  norm¬ 
alising  constant.  Norros  developed  a  teletraffic  model  that 
uses  fBrn  innovations  [7]  to  produce  a  cumulative  arrival 
stream,  A{t),  in  accordance  with 

A{t)  =  -t-  y/amB}j(t).  (7) 

Here,  m  is  defined  as  the  mean  arrival  rate  per  second  and  a 
is  the  variance  parameter. 

The  second  process  is  based  on  the  SRD  ARMA  model 
but  incorporates  an  additional  fractional  difference  term.  If 
B  is  defined  as  the  backshift  operator  (i.e.  B(xt)  =  xt-j, 
B^xt)  andy?(.)  and^(.)  are  polynomial  func¬ 

tions  of  order  p  and  q  respectively  then  the  ARIMA(p,  d,  q) 
process  is  given  as 

<p{B){\-BYxt  =  rP{B)et.  (8) 

et  is  the  excitation  noise  and  if  -  ^  <  c?  <  ^  then  the  process 
is  self-similar.  In  practice  the  ARIMA  trace  is  often  obtained 
by  generating  a  fGn  trace  with  a  suitable  H  and  filtering  this 
noise  with  the  ARMA  coefficients. 

The  important  points  to  note  from  this  section  are  that  the 
stationary  increments  of  BH{i)  are  drawn  from  a  Gaussian 
distribution,  i.e. 

Bnit)  -Bffit-  P)  -  7V(0,  (j^),  (9) 

and  that  the  exitation  noise  for  the  ARIMA  model,  6^,  is 
Gaussian  in  nature. 

3  Testing  the  Gaussian  assumption 

In  section  2  it  was  shown  that  both  of  the  most  studied 
self-similar  models  generate  traffic  by  performing  a  linear 
transform  on  fGn.  In  this  section  we  compare  true  fGn  with 
what  is  assumed  to  be  fGn  in  the  Bellcore  Ethernet  data. 


3.1  The  data 

The  Bellcore  data  consists  of  three  files  (pAug.TL, 
pOct.TL  and  OctExt.TL);  the  means  by  which  these 
files  were  constructed  is  detailed  in  [5].  For  each  file  a  fam¬ 
ily  of  work  per  time  unit  discrete  data  sets,  W^^[n]  were 
constructed.  This  was  done  by  selecting  a  suitable  time  unit 
and  totalling  the  number  of  Ethernet  bytes  recorded  per  time 
unit,  for  the  entire  trace. 

3.2  Data  transform 

To  obtmn  the  assumed  fGn  trace  consider  the  following. 
Assume  is  the  cumulative  work  per  Ati  seconds 

data  set,  then 

W^^^[n]  =  mnAt\  -f-  y/amBH[n].  (10) 

By  differencing  (10)  over  Afi,  the  following  expression  for 
the  arrivals  of  the  work  data  set  is  obtained, 

W^^^[n]  =  mAU  +  \/^y^[n].  (11) 

So  the  data  sets  are  assumed  to  be  composed  of  some  mean 
term  plus  a  scaled  fGn  process,  [n]  which  can  be  norm¬ 
alised  by  estimating  the  value  of  y/am. 

3.3  Data  Analysis 

The  time-series  [n]  was  obtained  from  the  data  sets 
using  the  transform  described  in  section  3.2.  In  theory,  if 
the  Gaussian  assumption  is  valid,  this  time-series  should  be 
drawn  from  a  Gaussian  distribution.  To  test  for  this  a  true 
fGn  trace  with  a  suitable  H  was  generated.  The  pdfs  of  both 
were  then  calculated  and  plotted.  Figure  1  compares  the 
pdfs  for  data  sets  generated  from  the  trace  file  pAug .  TL. 
The  fGn  pdf  can  be  seen  to  decay  more  rapidly  in  the  tails 
than  any  of  the  work  data  sets.  This  result  would  tend  to 
suggest  that  outlying  events  (those  far  from  the  mean  of  the 
distribution)  are  more  probable  in  the  real  data  than  in  the 
Gaussian  model  (i.e.  the  real  data  is  more  impulsive). 

The  previous  results  in  this  section  support,  in  a  qualitat¬ 
ive  manner,  the  hypothesis  that  fGn  innovations  are  not  able 
to  completely  capture  the  behaviour  of  Ethernet  traffic.  In 
the  remainder,  this  hypothesis  is  tested  more  formally. 

Shapiro  and  Wilk  developed  a  test  for  normality  that  has 
been  shown  to  outperform  other  methods  [8].  However 
the  Shapiro- Wilk  (SW)  test  requires  the  calculation  of  n 
coefficients  where  n  is  the  sample  size.  Due  to  the  large 
number  of  elements  in  some  of  the  data  sets  considered  in 
this  paper  the  large  sample  approximation  to  the  SW  test, 
developed  by  D’Agustino,  was  employed  [4]. 

This  test  is  based  on  a  statistic,  D,  which  is  a  ratio  of 
the  unbiased  estimate  of  the  population  standard  deviation 
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structed  from  pAugTL.  The  time  step  for  each 
data  set  is  given  as  dt  (in  seconds)  and  a  true 
fGn  case  is  given  as  fGn. 


hence  the  SW  test  and  H  were  calculated  for  W' , 

and  The  results  and  critical  values  for  all  the  data 

sets  are  given  in  Tables  1-3. 


pAug.TL 

N 

H 

^low 

^high 

Y 

314283 

0.582 

-1.967 

1.953 

-1218.94 

31428 

0.594 

-1.982 

1.937 

-149.36 

— VpT 

3142 

0.594  ' 

-2.030 

1.887 

-29.27 

314 

0.591 

-2.176 

1.724 

-7.16 

Table  1 .  Results  of  the  adjusted  51V  test  for  the  Bellcore 
trace  pAug.TL 


pOct.TL 

n 

N 

mm 

130390 

■it-fiCl 

wmm 

mijjjii 

■jyiH 

to  the  sample  standard  deviation.  Assume  ,  •  •  • ,  Xj\[  is 
a  sample  set  of  size  N.  If  the  set  is  ordered  such  that 
X]  <  X2,n  ■■■<Xn,n  the  statistic  D  is  given  by 


D  = 


T 

N^S’ 


(12) 


where  ^ 

{N y  (13) 

i=\ 

,2  _  nxi-xf  (,4) 

where  X  is  the  sample  mean.  The  statistic  D  has  ^  0  and 
cr^  ^  1  but  can  be  normalised  to  give  the  statistic  Y  using 

_  (-P  ~  (2\/^~')-y^  (15) 

0.02998598 

If  the  set  X  is  not  drawn  from  a  normal  distribution  then 
£'[y]  ^  0.  The  critical  levels  for  a  (1  -  a)  confidence  level 
can  be  obtained  from  tables  or  by  calculation  [4]. 

The  data  sets  \  W'  and  W'°  for  pAug .  TL 

and  pOct .  TL  were  transformed  into  [n]  as  in  section  3.2 
and  then  tested  using  the  adapted  SW  test  and  the  statistic 
Y  was  recorded.  The  critical  values  for  a  .95  confidence 
level  were  calculated  for  the  required  N  (ciqw  ^*1  ^highl' 
Also,  the  Hurst  exponent  estimate,  H,  for  each  of  the  as- 
summed  fGn  data  sets  was  calculated  using  the  Whittle  ap¬ 
proximation  method  (chapters  5  and  6  in  [2]).  The  trace 
file  OctExt .  TL  was  gathered  over  a  longer  time  scale  and 


Table  2  .  Results  of  the  adjusted  SW  test  for  the  Bellcore 
trace  pOct.TL 


OctExt.TL 

N 

<^low 

^high 

y 

W' 

122798 

0.613 

-1.971 

1.948 

-1488.39 

VT'*' 

12279 

0.634 

-1.996 

1.923 

-403.53 

1227 

0.621 

-2.072”] 

1.843 

-106.43 

122 

0.579 

-2.298 

1.572 

-25.77 

Table  3  .  Results  of  the  adjusted  SW  test  for  the  Bellcore 

trace  OctExt.TL 


The  results  in  the  tables  above  show  that  at  a  .95  confid¬ 
ence  level,  all  the  data  sets  for  all  the  Bellcore  traces  failed 
the  adjusted  Shapiro  Wilks  test.  Therefore  the  assummed 
fGn  data  sets  produced  from  Ethernet  trace  can  not  be  true 
fGn  (which  would  be  expected  to  pass  the  SW  test).  It  might 
be  possible  from  this  to  conclude  that  the  Ethernet  data  does 
not  conform  to  the  models  introduced  in  section  2. 

It  was  possible  that  the  self-similar  nature  of  the  data 
could  be  affecting  the  results.  Equation  (4)  illustrates  that, 
due  to  LRD,  non-parametric  measures  of  self-similar  time- 
series  can  suffer  from  convergence  problems.  It  was  possible 
that  the  LRD  in  the  Ethernet  data  affected  the  Shapiro  Wdk 
test  results.  In  order  to  investigate  this,  true  fGn  traces  with 
varying  H  and  length  (N)  were  constructed  and  tested  using 
the  Shapiro  Wilk  test  with  a  .95  confidence  interval.  This 
experiment  was  repeated  200  times  and  the  percentage  pass 
rates  are  recorded  in  Table  4. 


446 


1  True  fGn  trace  H  \ 

N 

0.5 

0.6 

0.7 

0.8 

0.9 

1.0 

1024 

80.5 

81.5 

84 

76.5 

77 

85.5 

2048 

87.5 

90.5 

89.5 

91.5 

87.5 

82.5 

4096 

96 

96 

95 

92 

87.5 

73.5 

8192 

97.5 

96 

96.6 

95.5 

88.5 

59.5 

16384 

98 

97 

98 

96 

85.5 

41 

32768 

98.5 

99 

99.5 

97.5 

80.5 

54 

65536 

99 

97.5 

98 

98 

89.5 

46.5 

131072 

99 

99 

99 

97 

86.5 

36.5 

262144 

99 

98.5 

98 

98.5 

79 

16 

524288 

99.5 

99.5 

100 

97.5 

81 

17.5 

1048576 

99 

100 

98.5 

97 

78 

18 

Table  4.  Percentage  of  the  SW  tests  that  passed 
for  truefGn  traces. 

The  bold  entries  in  Table  4  indicate  where  the  fGn  traces 
seem  to  pass  the  SW  test  the  expected  number  of  times  for 
a  .95  confidence  level.  Extreme  values  of  H  (i.e.  H  >  0.9) 
and  low  values  of  N  seem  to  produce  fewer  than  expected 
hypothesis  passes  and  when  =  1  the  pass  rate  decreases 
with  N .  This  is  due  to  the  fact  that  a  time-series  with  such  a 
large  H  possess  very  strong  correlations  over  all  time  scales 
and  any  non-parametric  measure  (such  as  those  in  (12)-(14)) 
will  not  converge  for  any  N.  If  we  compare  the  lengths  and 
Hurst  exponent  estimates  for  the  Bellcore  data  sets  we  can 
see  that  most  occur  inside  the  bolded  area  of  Table  4.  This 
suggests  that  LRD  is  not  the  reason  for  the  failure  of  the  S  W 
tests  for  the  Bellcore  data. 

4  Conclusions 

This  paper  has  presented  evidence,  both  qualitative  and 
quantitative,  to  suggest  that  Ethernet  data  does  not  conform 
to  popular  self-similar  models.  The  evidence  would  suggest 
that  Ethernet  is  more  impulsive  than  the  Gaussian  case  which 
these  models  assume.  One  alternative  is  to  use  the  more 
general  class  of  stable  distributions,  which  can  be  much  more 
impulsive  than  the  Gaussian  case.  In  [1]  the  parameters  for 
a  stable  distribution  were  estimated  for  both  the  Bellcore 
Ethernet  data  and  similar  data  generated  in  Edinburgh.  The 
initial  results  gave  sensible  estimates  for  these  estimates  and 
on-going  work  involves  the  construction  of  suitable  models 
that  incorporate  these  distributions. 
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Abstract 

The  spectrum  of  a  signal  subjected  to  sampling  jit¬ 
ter  can  be  significantly  different  from  the  spectrum  of 
the  same  signal  sampled  without  jitter.  The  first  part 
of  the  paper  shows  that  the  spectrum  of  a  continu¬ 
ous  Gaussian  signal  can  be  reconstructed  from  a  com¬ 
bined  use  of  the  sampled  (with  jitter)  signal  second  and 
fouHh-order  statistics.  This  spectral  reconstruction  is 
then  used  to  detect  the  presence  or  absence  of  jitter  in  a 
sampled  signal.  A  likelihood  ratio  detector  based  on  the 
spectral  corrective  term  is  studied.  It  gives  a  reference 
to  which  suboptimal  detectors  can  be  compared. 


1.  Introduction 

The  sampling  jitter  detection  and  estimation  have 
been  investigated  in  many  applications.  These  applica¬ 
tions  include  spectral  estimation  [8]  or  source  localiza¬ 
tion  [10].  The  sampling  jitter  affects  the  sampled  signal 
Power  Spectral  Density  (PSD).  It  is  usually  modeled 
as  a  random  process  whose  variance  characterizes  the 
spectral  degradation.  This  paper  studies  the  problem 
of  sampling  jitter  detection  for  Gaussian  signals  with 
imknown  spectral  properties. 

The  problem  of  sampling  jitter  detection  was  re¬ 
cently  considered  in  [12]  (using  the  bispectrum  of  sam¬ 
pled  data).  However,  the  study  was  restricted  to  zero- 
mean  stationary  band-limited  continuous  processes, 
with  nonzero  third-order  cvunulant  function.  Conse¬ 
quently,  the  results  cannot  be  applied  to  continuous 
Gaussian  processes.  The  estimation  and  detection  of 
sampling  jitter  were  also  considered  in  [1]  and  [2].  How¬ 
ever,  the  study  was  restricted  to  signals  with  known  or 
specific  properties. 

The  paper  is  organized  as  follows.  The  first  sec¬ 
tion  shows  that  the  Gaussian  signal  PSD  can  be  deter¬ 
mined  as  a  function  of  the  measured  sample  second  and 


fourth-order  statistics.  The  second  section  proposes  a 
suboptimal  hypothesis  test  for  sampling  jitter  detec¬ 
tion.  Simulation  results  and  conclusions  are  reported 
in  the  two  last  sections. 

2.  Problem  Formulation 

A  real  stationary  band-limited  Gaussian  process 
X  (t)  is  considered,  with  unknown  PSD  s  (oj)  {s{w)  =  0 
for  [wl  >  7r//i)  and  autocorrelation  fimction  r(r). 
X  (t)  is  sampled  according  to  the  Shannon  theorem 
by  a  sampler  that  exhibits  jitter.  The  sampling  in¬ 
stants  are  tn  =  nh  +  7„,  where  7„  is  a  zero-mean 
i.i.d.  sequence  with  unknown  characteristic  function 
^{w)  =  E  [exp  (jw7„)].  The  random  variable  7„  is  as¬ 
sumed  to  be  symmetrically  distributed  such  that  $  (w) 
is  real  even.  Parameter  h  is  chosen  equal  to  1  without 
loss  of  generality.  The  samples  are  denoted  Xn  =  ^  {tn)- 
The  first  part  of  the  paper  addresses  the  problem  of  es¬ 
timating  the  PSD  s  (w)  using  the  observations  x„. 


3.  Spectral  Estimation 


3.1.  Second-Order  Statistics 


The  second  order  statistics  of  Xn  can  be  computed 
as  functions  of  the  PSD  s  (w)  and  the  jitter  characteris¬ 
tic  function  $  (oj)  (using  conditional  expectation  with 
respect  to  tn+p  ~  tn)-  Denote  r/,(m)  =  E[xnXn+m]- 
The  following  results  are  obtained  [4]: 


m  ^  0 
m  =  0 


rh(m)  —  f  (u)\^  s  (u)  e^^”^du 
J— TT 

rh  (0)  =  /  s  (w)  du  =  r  (0) 


The  PSD  of  Xn  can  then  be  determined: 
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1  +00 
m=~oo 

=  {uj)f  s  {oj) —  j  5(u)dw 

The  modulus  of  ^  (u)  being  upper  bounded  by  1,  the 
integral  in  the  previous  equation  is  positive.  Conse¬ 
quently,  the  jitter  effect  can  be  interpreted  at  the  sec¬ 
ond  order  as  a  filtering  process  embedded  in  additive 
noise. 


the  PSD  s  (uj)  as  a  function  of  Sh  (u)  and  the  trispec¬ 
trum  T/i  (.),  which  is  independent  of  the  characteristic 
function  $  (^)-  Denote  u  =  v  ~  coi  -\-uJ2  ^nd 
define  a  new  function  F/i  (u,  v)  =  irTh 

Th{u,v)  =  -^  (2) 

l,P\P^Q 

It  can  be  shown  that  [4]: 

s  (w)  Sh  (uj)  =  2  r  ( 'J  du  +  Sh  (Ujf 
*'0  \  J  ^——11 


3.2.  Fourth-Order  Cumuleints 

The  fourth-order  cumulants  of  the  sample  signal  a;„ 
denoted  Ch  {k,  I,  m)  can  be  computed  as  functions  of  the 
PSD  s{uj)  and  the  jitter  characteristic  function  $  (lj). 
The  only  non-zero  fourth  order  ciunulants  of  a;„  are 
Ch  ik,l,l)  with  k  7^  I  [4][9]: 

•  first  Case:  k  =  0,1  ^0 

Ch  (0,1,1)  =  f  f  (u,  v)  (u)  s  (v)  dudv 

with  /  (u,  u)  =  2  p#  (u  -t-  u)|^  -  1$  (u)  $  (u)!^  . 

•  second  case:  k^0,l  ^j^k 

Ch  (k,  1, 1)  =  //+J  g  (u,  v)  (u)  s  (v)  dudv 

with 


A  closed  form  expression  of  the  above  integral  as  a 
function  of  the  fourth-order  cumulants  of  can  then 
be  derived 


sine  (ku}/2)  sin  (ktj/2)  (1  -  A:)  Ch  (k,  1, 1)  (3) 

Ijik 

The  PSD  s  (cj)  is  then  related  to  the  PSD  Sh  (cu)  and 
cumulants  Ch  (/c,  1,1)  by  the  following  relation 

s  (u>)  Sh  (w)  =  21  (tv)  +  Sh  (wf  (4) 

Note  that  eq.  (4)  leads  to  Sh  (w)  =  s/,  (w)  in  absence 
of  jitter. 


g  (u,  v)  =  2 


$  (u  -f  u)  $*  (u)  (v)  -  1$  (u)  #  (u)l^ 


4.  Sampling  Jitter  Detection 


Define  the  trispectrmn  of  x„  in  the  usual  domain 
|wi|  ,  |w2|  ,  <  TT,  |wi  +U2  <  tt: 

Th(u}i,U)2,Uz)  =  Y  Ch(l-p,l,l) 

(2^)  i,p;p^0 

(  ( W 1  +^3 

The  previous  cumulant  expressions  allow  to  express  the 
trispectrum  Th{(^i,(jJ2^^z)  as  a  fxmetion  of  <^(.)  and 
s(.). 

3.3.  Spectral  Reconstruction 

Consider  a  new  process  whose  autocorrelation 
function  and  PSD  are: 

/TT 

•TT 

Sh{i^)  =  |#(a;)|^s(a;)  (1) 

It  has  been  shown  that  Th  (m)  can  be  estimated  with 
two  samplers  with  independent  jitters  7„  and 
[11]. This  section  derives  a  closed  form  expression  of 


4.1.  Simple  Binary  Hypothesis  Test 

The  main  contribution  of  this  work  is  to  use  eq.  (4) 
for  sampling  jitter  detection.  The  sampling  jitter  de¬ 
tection  problem  can  be  expressed  as  a  binary  hypoth¬ 
esis  testing  problem: 

Ho  (no  jitter)  :  x„  =  x(n) 

Hi  (jitter) :  a;„  =  x(n  -|-  7„) 

with  n  6  {l,...,iV}.  Given  the  statistics  of  the  jitter 
and  the  continuous  process,  a  Likelihood  Ratio  Test 
(LRT)  can  be  derived.  Unfortunately,  the  test  statis¬ 
tics  is  usually  difficult  to  study.  Moreover,  the  sta¬ 
tistical  properties  of  the  jitter  are  unknown  in  many 
practical  applications.  This  section  proposes  a  subop>- 
timal  samphng  jitter  detector.  This  detector  does  not 
require  any  assumption  concerning  the  jitter  statistics 
and  the  signal  parameters.  The  problem  (5)  can  be  re¬ 
formulated  as  the  following  binary  hypothesis  testing 
problem: 

Ho  (no  jitter)  :  I  (iv)  =  0  Vo;  e  ]0,  tt] 

Hi  (jitter)  :  I(uj)  ^0  Vcv  e  Q  (6) 
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In  eq.  (6),  fi  is  a  non-empty  subset  included  in  ]0,7r]. 
The  detection  problem  can  then  be  discretized  as  fol¬ 
lows  (considering  different  frequencies  Uk  =  k-^)- 

Ho  (no  jitter)  :  Im  =  0  (7) 

Hi  fitter)  ••  Im 

where  /„  =  (/  (f,)  ,  /  (2^)  ,  !(”))'■  The  ““P"' 

tation  of  the  vector  Im  involves  infinite  smnmations 
(see  eq.  (3)).  However,  these  summations  can  be  trun¬ 
cated  using  fourth-order  mbdng  properties  of  x„  and 
the  decaying  of  sin^  (•)•  Simulation  results  were  shown 
to  be  in  good  agreement  with  mathematical  deriva¬ 
tions  for  |fcl,l/i  <  6  in  [4].  Consequently,  the  test 
is  performed  using  Jhe  corresponding  approximated 
(truncated)  vector  Im-  The  Neyman-Pearson  Detec¬ 
tor  (NPD)  [13]  for  (7)  can  be  expressed  as  (assuming 
parameters  7m  >^0  nnd  Ei  are  known)  : 

J7o  rejected  if  Tln>S{PFA)  (8) 

is  the  test  statistics  and  S{PFA)  is  a  thresh¬ 
old  depending  on  the  PFA  (defined  by  PFA  = 
P  \T^  >  S'|7fo]).  The  performance  of  the  NPD  can 

be  determined  from  the  statistical  properties  of  Tin- 
Tlr  =  -  Qo  [Im)  is  the  difference  between 

two  quadratic  forms  of  Im-  The  vector  Im  is  linearly 
related  to  a  fomth-order  cumulant  vector  Ch-  l^ing  the 
property  that  the  cumulant  vector  estimate  c*  is  as¬ 
ymptotically  an  unbiased  Gaussian  vector  [5],  the  sta¬ 
tistics  of  Im  (estimate  of  Im)  can  be  asymptotically 
determined  under  both  hypotheses: 

Ho  (no  jitter)  :  Im  (0>  ^o) 

Hi  (jitter)  -iM^-l^i^M,  Si) 

Consequently,  is  the  difference  between  two  posi¬ 
tive  definite  quadratic  forms  of  the  Gaussian  vector  Im 
(under  both  hypotheses).  Unfortunately,  the  quadratic 
form  'PIr  can  be  indefinite.  Relatively  little  attention 
has  been  devoted  to  the  problem  of  obtaining  the  dis¬ 
tribution  of  indefinite  quadratic  forms  of  Gaussian  vec¬ 
tors.  Expansions  as  mixtures  of  noncentral  x  (fistrite 
utions,  in  Laguerre  or  Maclaurin  series  were  derived  in 
[7].  Unfortunately,  these  expansions  lead  to  high  com¬ 
putational  cost.  Instead,  approximations  by  Gaussian 
or  Gamma  distributions  can  be  used  which  lead  to  a 
simple  test  performance  computation.  Fig.  1  shows 
that  the  Probability  Density  Function  (PDF)  of  Tfj, 
can  be  approximated  (with  sufficient  accuracy)  by  the 
Gaussian  PDF.  The  Gaussian  assumption  for  al¬ 
lows  to  study  the  NPD  performance,  for  i^tance  in 
terms  of  ROC  curves  as  a  function  of  the  jitter  vari¬ 
ance  (see  simulation  results). 


4.2.  Composite  Hypothesis  Test 

The  NPD  maximizes  the  Probability  of  Detection 
(PD)  for  a  fixed  Probability  of  False  Alarm  (PFA). 

It  provides  a  reference  to  which  suboptimal  detectors 
can  be  compared.  However,  it  requires  the  knowledge 
of  parameters  Jm,So  and  Si.  These  parameters  are 
unknown  in  many  practical  applications  and  have  to 
be  estimated.  The  maximum  hkelihood  estimator  for 
these  parameters  combined  with  the  NPD  yields  the 
Generalized  Likelihood  Ratio  Detector  (GLRD).This 
section  studies  the  GLRD  [13]  for  (7),  when  the  para¬ 
meters  7m, So  and  Si  are  unknown.  The  N  samples 
Xn  are  divided  into  K  records,  each  having  M  samples. 
The  GLRD  for  (7)  was  shown  to  be  equivalent  to  [6]  [12]: 

J7o  rejected  if  >  5  (PFA)  (9) 

where  p  and  S  are  the  usual  mean  and  covariance  esti¬ 
mates.  The  statistical  properties  of  under  both 

hypotheses  were  derived  in  [6].  However,  they  lead  to 
a  high  computational  cost.  Instead,  approximations 
iiaing  the  central  or  noncentral  chi-square  distribution 
can  be  used  which  lead  to  a  simple  test  performance 
computation  [12]. 

5.  Simulation  Results 

Many  simulations  have  been  performed  to  validate 
the  previous  theoretical  results.  In  this  paper,  the  sim¬ 
ulation  results  are  presented  for  a  Gaussian  stationary 
process  dX  =  -aXdt  +  dv  with  a  >  0  (as  in  [4]).  u  (t) 
is  a  Wiener  process  with  variance  parameter  27r  and 
X  (0)  is  a  zero  mean  Gaussian  variable  with  wiance 
irla.  The  PSD  of  this  process  is  s(a;)  =  1/  (a^  -|- w  ). 
The  discretization  of  X  {t)  at  instants  tn  leads  to  [3]  . 

\l/2 

Xn+i  =  +  (l  -  £n 

where  e„  is  an  i.i.d.  Gaussian  sequence  with  variance 
7r/a.  For  brevity,  all  experiments  have  been  conducted 
with  a  =  2.  The  jitter  is  a  binary  process  taking 
(equally  likely)  the  values  {-tr  a},  where  is  the  jit¬ 
ter  variance.  The  corresponding  jitter  characteristic 
function  is  $  (tu)  =  cos  (uicr). 

5.1.  Spectral  Reconstruction 

Eq.  (4)  can  be  used  for  PSD  estimation: 

s  (w)  =  2„  -1-  Sh 

^  ’  Sh  (w) 
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However,  a  division  by  Sh  (u>)  is  required,  which  can 
lead  to  numerical  problems.  A  slightly  different  esti¬ 
mation  scheme  is  preferred,  to  cope  with  these  numer¬ 
ical  problems.  The  spectra  s  (oj)  and  s/,  (w)  are  very 
similar  in  practical  applications.  Eq.  (4)  can  then  be 
approximated  by 

5  (oj)  Sh  (u>)  -|-  21  (u>) 

The  cumulants  Ch  {k,  1, 1)  are  estimated  for  |A;|  ,\l\  < 
6jk  ^  by  averaging  the  estimates  obtained  on  seg¬ 
ments  of  iV/30  samples  with  10%  overlap  {N  =  10^). 
Fig.  2  shows  the  theoretical  PSD’s  s  (uj)  ,  Sh  (cj)  and 
their  estimates.  As  can  be  seen,  mathematical  deriva¬ 
tions  and  simulation  results  are  in  good  agreement. 

5.2.  Bineu-y  Hypothesis  Test 

The  detection  performance  is  studied  for  different 
number  of  samples  and  different  values  of  jitter  vari¬ 
ance.  Fig’s  3  shows  the  ROC  Curves  corresponding  to 

(8)  (obtained  for  known  parameters  Im,^o  and  Ei). 
As  can  be  seen,  a  large  number  of  samples  is  required 
to  obtain  a  good  jitter  detection  performance  (even  for 
large  jitter  variance).  This  means  that  the  jitter  has  lit¬ 
tle  effect  on  the  PSD  (as  it  can  be  analytically  checked 
using  the  expression  of  ^  (a;) ,  s  (a;)  and  s  (a;)).  ROC 
curves  corresponding  to  a  high-pass  signal  (corrupted 
by  binary  sampling  jitter)  are  currently  imder  investi¬ 
gation. 

Note  that  the  test  can  be  significantly  improved 
when  the  truncation  bounds  in  (3)  and  M  tend  to  in¬ 
finity  when  the  number  of  samples  N  tends  to  infinity. 
Unfortunately  the  asymptotic  distribution  of  Jm  is  not 
easy  to  derive. 

Normahty  tests  can  be  a  useful  tool  for  detecting  the 
presence  or  absence  of  jitter.  However,  the  interesting 
problem  is  to  detect  the  significant  spectral  degrada¬ 
tions  due  to  jitter  (and  not  the  deviations  from  normal¬ 
ity),  in  a  spectral  estimation  context.  The  problem  (7) 
has  then  to  be  preferred  to  any  normality  test,  in  these 
spectral  estimation  applications.  Moreover,  deviations 
from  normahty  are  not  always  inherent  to  sampling  jit¬ 
ter. 

6.  Conclusion 

An  original  solution  to  the  spectral  estimation  prob¬ 
lem  in  the  presence  of  jitter  was  presented.  The  estima¬ 
tion  procedure  was  restricted  to  continuous  Gaussian 
signals  but  did  not  require  any  additional  assump¬ 
tion.  A  hkehhood  ratio  detector  was  derived  to  decide 
whether  spectral  distortions  due  to  sampling  jitter  are 


significant  or  not.  The  proposed  test  was  suboptimal 
since  it  did  not  work  on  the  data  themselves.  How¬ 
ever,  it  did  not  require  any  statistical  assumption  for 
the  jitter  process. 
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Figure  1.  Histograms  and  approximated  PDF 
of  with  95%  confidence  intervals  a)  under 
Ho  b)  under  Hi. 


PFA 

(c) 


Figures  3.  ROC  Curves  for  the 
NeymEm-Pearson  Detector  a)  N  =  10^  b) 
AT  =  10®  c)  AT  =  5.10®. 


Figure  2.  PSD’s  estimates. 
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Abstract 


2  FARIMA  models 


In  this  paper,  we  consider  some  independent  prob¬ 
lems  concerning  HOS  and  nonGaussian  long  range 
dependent  process.  We  exhibit  HOS  for  non  integer 
FARIMA  processes,  showing  that  singularities  may  oc¬ 
cur  in  multispectrum.  A  class  of  nonGaussian  pro¬ 
cesses  constructed  by  a  multiplicative  process  on  the 
dyadic  tree  is  then  considered.  Finally,  long  range  de¬ 
pendence  problems  in  the  Fourier  transform  are  exam¬ 
ined. 


1  Introduction 

Many  processes  occuring  in  fields  as  different  as 
physics,  hydrology,  economics,  , . .  appear  to  present 
long  memory,  that  is,  present  a  very  slow  decay  of  their 
correlation  function.  Furthermore,  in  fields  where  non¬ 
linear  phenomena  are  involved,  measured  signals  are 
often  well  modelled  using  nonGaussian  random  func¬ 
tions.  It  seems  therefore  important  to  consider  non¬ 
Gaussian  stochastic  processes  that  present  long  mem¬ 
ory. 

The  aim  of  this  paper  is  not  to  give  a  full  theory 
of  nonGaussian  processes  with  long  memory.  Instead, 
we  will  try  to  give  some  ideas  concerning  HOS  and 
nonGaussian  long  memory  processes.  We  first  consider 
FARIMA  processes  which  are  easy  to  handle  since  they 
are  linear  processes.  We  examine  some  of  their  HOS 
properties  and  show  the  complexity  of  these  statistics. 
We  then  present  a  construction  of  nonGaussian  pro¬ 
cesses  based  on  a  multiplicative  process  on  the  dyadic 
tree.  Some  studies  are  developped,  both  on  the  pro¬ 
cess  and  on  the  reconstructed  process  using  a  Haar  ba¬ 
sis,  exhibiting  long  range  memory  features.  Finally,  we 
will  provide  a  short  discussion  about  long  range  depen¬ 
dence,  Gaussianity  and  the  Fourier  transform. 


This  section  is  devoted  to  the  study  of  FARIMA 
models.  We  first  recall  some  results  concerning  second- 
order  statistics  before  examining  some  higher-order 
properties,  especially  on  the  bispectrum. 

2.1  Second-Order  Statistics 


FARIMA  models  are  parametric  models  now  widely 
used  to  model  long  range  dependent  stationary  pro¬ 
cesses.  They  are  defined  as  follows.  Let  B  be  the  unit 
delay  operator  Bxn  =  The  operator  V  =  I- B, 
where  I  is  the  identity  operator,  is  the  discrete  dif¬ 
ferencing  operator.  A  FARIMA(0,c/,0)  process  Xn  is  a 
fractional  integration  of  an  i.i.d  sequence,  it  is  to  say  a 
solution  of  V^Xfi  =  e„,  where  is  an  i.id.  sequence. 
It  can  be  shown  [2]  that,  if  d  e  [-0.5, 0.5]  there  is  a 
unique  stationary  process  Xn  solution  of  V^Xn  =  Cn . 
It  can  be  formally  written  Xn  =  Using  the 

binomial  expansion,  Xn  can  be  written 


+00 

x„  =  ^  ’^kCn-k  with  = 
k=0 


r{k  +  d) 
r{k  +  i)r(d)’ 


(1) 


T{t)  being  the  Gamma  function.  Second-order  statis¬ 
tics  of  FARIMA(0,d,0)  processes  are  explicitly  known. 
The  correlation  Cx,2(h)  =  Cum[xn,Xn+k]  reads 


r(i  -  2d) 

T{i-d  +  k)r{i-d-k) 


whereas  the  spectral  density  5®,2('^) 
^Cx,2(k)  exp{—2iTr\k)  is  given  by 


■5r,2(A)  =  crg|2sm(7rA)| 


Interesting  conclusions  may  be  drawn  from  these  ex¬ 
pressions.  Note  that  if  d  6  [0,0.5]  presents 

a  singularity  near  frequency  0,  since  for  A  small,  one 
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gets  the  behavior  Sx,2W  ^  A“^^.  This  implies  that 
for  large  k,  the  correlation  behaves  like  There¬ 

fore,  the  correlation  is  not  summable  since  its  decay  is 
slower  than  1/k,  This  justifies  the  terminology  ‘long 
range  dependence”  or  “long  memory” . 

FARIMA(0,d,0)  models  may  be  generalized  to 
FARIMA(p,d,g)  models.  This  is  done  by  filtering 
a  FARIMA(0,d,0)  process  by  an  ARMA(p,  g)  filter. 
Therefore,  Xn  is  a  FARIMA(p,  d,  process  if  it  is  a 
solution  of  a{B)V^Xn  -  h{B)en  where  is  an  i.i.d. 
sequence.  The  spectral  density  of  this  process  is  clearly 

/  -2?‘7rA\ 

S'a;,2(A)  =  crgl2sin(7rA)l 

However,  no  explicit  form  for  the  correlation  function 
is  known.  But  the  asymptotic  behavior  of  Ca;,2(fc)  is 
again  like  showing  that  FARIMA(p,  d,  5)  pro¬ 

cesses  have  long  memory  whenever  d  E  [0, 0.5].  Hence, 
the  presence  of  the  ARM  A  terms  gives  the  modeling  of 
the  short  time  duration  correlations. 

2.2  HOS  for  FARIMA(0,d,0)  processes 

Let  be  a  FARIMA(0,d,0)  process,  whose  driving 
i.i.d.  sequence  is  denoted  by  Cn  and  is  assumed  non- 
Gaussian.  Xn  results  from  the  filtering  of  white  noise 
Cn  by  a  filter  of  impulse  response  .  It  may  be  argued 

that  has  an  infinite  extent  and  that  Xn  should  tend 
to  a  Gaussian  process.  However,  classical  central  limit 
theorems  do  not  apply  if  d  E  [0,0.5]  since  is  not 
summable.  Indeed,  is  slowly  decaying  to  zero  as 
We  can  then  evaluate  HOS  of 
Let  C2,,p(k)  =  Cum[xn,Xn+ki,-  ^-y^n+kp^i]  t>e  the 
p-th  order  multicorrelation  of  Xn  and  Sx,p{^)  — 

^kC'ar,p(k)exp(-2i7rk^A)  its  p-th  order  multispectral 
density  (k  =  (/?i,  • .  • , kp-i),\  =  (Ai, . . . ,  Ap_i)  and  ^ 
stands  for  transposition).  Xn  being  the  output  of  a  lin¬ 
ear,  time-invariant  filter  driven  by  e^,  it  is  well  known 
that 


p-i 

5.,p(A)  =  ■ .  .^f(Ap-i)5e.p(A) 

1  =  1 

where  H{X)  is  the  transfer!  function  associated  with 
the  impulse  response  "^k-  Since  Cn  is  an  i.i.d.  sequence, 
its  multispectral  densities  are  constant,  or  5e,p(A)  = 
7e,p.  Furthermore,  H{X)  is  simply  [1  —  exp(— 2i7rA)] 
or’exp(i7rAd)[2isin(7rA)]“^.  Then,  the  p-th  order  mul¬ 
tispectral  density  of  Xn  reads 

p-i  p~i  ■ 

sin(7r  n  sin(7rAi) 

i = 1  2=1 


5,,p(A)  =  7e.p(2i)-P" 


Therefore,  when  d  G  [0, 0.5],  the  multispectral  densities 
presents  singularities  on  the  following  manifolds 

r  Ai  =  0  Vi= 

lEL'/Ai  =  0 

These  singularities  behaves  like  powerlaws,  and  fur¬ 
thermore  are  not  the  same  on  all  the  manifolds.  To 
get  some  more  ideas  on  this  behavior,  we  restrict  the 
discussion  to  the  bispectrum  which  reads 

Sx  3(A)  =  7^4  [sin(7rAi)  sin(7rA2)  sin(7r(Ai  +  A2))]"'' 

'  (2) 
Suppose  now  that  Ai,A2  <<  1  Q'lid  that  Ai  and  A2  are 
of  the  same  order.  Then,  the  bispectrum  (2)  may  be 
approximated  by 

5*, 3(A)  «  7e,3(8i7r^)“'^  [AiA2(Ai  +  A2)] 

This  approximation  is  no  longer  valid  if  the  frequencies 
are  not  of  the  same  order.  For  example,  let  A2  »  0 
and  let  Ai  going  to  zero.  Then,  the  approximation 
is  S:,,3(A)  «  7e,*(-20-3‘^sin-2‘'(7rA2)(7rAi)-''.  This 
gives  the  singularity  strength  on  the  axis  A,  =  0  when 
A2  is  fixed.  5*, 3(A)  is  depicted  in  figure  (1)  where  the 
behavior  described  above  appears  clearly. 


Figure  1.  Theoretical  bispectrum  of  a 
FARlMA(0,d,0)  process. (Log  of  the  modulus) 


We  now  turn  to  the  multicorrelations.  We  unfor¬ 
tunately  have  not  found  a  closed  form  expression  for 
them  in  the  simple  case  of  FARIMA(0,d,0)  processes. 
However,  we  can  provide  some  heuristic  arguments  to 
study  the  behavior  of  the  bicorrelation.  It  reads 

CxAk,,k2)  =  J  j 

The  heuristic  argument  is  based  on  the  fact 
that  the  bispectrum  is  correctly  approximated  by 
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7e,3(8i7r^)”^  [AiA2(Ai  +  A2)]”‘^  for  small  frequencies. 
But,  small  frequencies  correspond  to  long  time  lags  in 
the  bicorrelation.  Therefore,  to  get  the  behavior  of  the 
bicorrelation  at  infinity  we  can  study  the  integral 

j  j  [AiA2(Ai+A2)]~‘'e2*’^(*i^‘+*=^=)rfAidA2 


Let  us  study  the  bicorrelation  on  its  diagonal  ki  — 
/?2  =  k.  Changing  variables  Ai,A2  to  variables  ui  = 
kXi ,  U2  =  k\2  leads  to 


C,,p{k,k)  =  k^‘^-^  J  J 


^2in(ui+U2) 

- rdui dUo 

[utn2{u^  +  U2)Y 


Since  the  integral  is  independent  of  the  asymptotic 
behavior  of  Cx,p{k,  k)  is  If  d  E  [—0.5,  0.5],  then 

3d  —  2  G  [—3.5,  —.5].  Hence,  the  long  memory  charac¬ 
teristic  is  again  found  on  the  bispectrum  (things  may 
be  different  at  higher  orders,  since  the  power  laws  may 
decrease  faster  that  k~^). 

As  it  can  be  seen,  the  study  of  higher-order  proper¬ 
ties  of  FARIMA  models  is  difficult.  The  few  arguments 
presented  above  lead  to  the  conclusion  that  the  study 
of  nonGaussian  long  memory  processes  will  be  a  hard 
task.  Further,  the  complicated  behavior  of  the  bispec¬ 
trum  or  of  the  bicorrelation  will  make  it  difficult  to 
estimate  these  statistics.  For  example,  smoothing  the 
biperiodogram  will  need  special  care  around  the  singu¬ 
larity  axis,  and  the  study  of  this  kind  of  estimators  will 
involve  special  limit  theorems. 

To  conclude,  let  us  say  that  the  modelling  of  non¬ 
Gaussian  signals  which  present  long  memory  is  at  its 
beginning.  In  the  following  section  we  present  another 
model  which  mimics  a  physical  phenomenon  to  lead  to 
nonGaussian  1/f  processes. 


3  Multiplicative  process  on  the  dyadic 
tree 


Recent  studies  in  turbulence  stress  on  the  modelling 
of  the  energy  transfert  from  large  scales  to  small  scales 
(energy  cascade).  In  [3],  B.  Castaing  has  proposed  to 
relate  the  pdf  of  the  velocity  increments  at  a  scale  r  to 
the  pdf  of  the  velocity  increments  at  a  larger  scale  L 
via 


PSvr{^)  =  J  G'ri,(lna)ipi„^  (3) 

Working  on  the  velocity  increments  is  a  mean  to  define 
the  notion  of  scale.  This  notion  can  be  made  clearer 
using  the  wavelet  transform.  Therefore,  a  natural  gen¬ 
eralization  of  the  last  equation  is  obtained  using  the 


continuous  wavelet  transform  instead  of  the  velocity 
increments  [4]. 

Coming  back  to  (3),  we  see  that  in  terms  of  random 
variables,  the  increment  at  scale  r  is  obtained  via  the 
product  of  the  increments  at  scale  L  by  an  independent 
random  variable  G.  We  can  thus  mimic  this  behavior 
by  defining  a  multiplicative  process  on  a  tree.  This  ap¬ 
proach  as  already  be  considered  in  [1,4].  The  node  at  a 
defined  depth  will  defined  the  details  at  a  correspond¬ 
ing  scale.  In  order  to  be  able  to  reconstruct  a  time 
series,  we  will  work  on  the  dyadic  tree  corresponding 
to  a  decomposition  over  an  orthogonal  wavelet  base. 

In  the  next  section,  we  elaborate  on  the  multiplica¬ 
tive  process  on  the  tree,  forgetting  the  interpretation 
in  terms  of  details.  The  next  section  deals  with  the 
reconstructed  time  series. 

3.1  Multiplicative  process  on  the  dyadic 
tree 

Consider  the  process  {dj^kyj  =  0, 1, . . , ;  ^  = 

0, . . . ,  2-^  —  1}  on  the  dyadic  tree  defined  as 

=  <^j,kdj-l^[k/2] 

where  [,]  stands  for  integer  part  of,  and  where  {aj^kyj  = 
1, , . . ;  Ar  =  0, . . . ,  2-^— -  1}  are  i.i.d.  random  variables. 
Since  the  process  is  multiplicative,  it  seems  easier  to 
study  it  with  moments  rather  than  cumulants.  Let 

be  a  multicorrelation  of 
order  n-|-m  of  the  process  at  depth  J.  Let  d(l,  2)  be  the 
distance  on  the  tree  between  the  two  nodes  ( J,  Aq)  and 
(J,  Ar2).  Let  ja  the  depth  of  their  common  ancestor  on 
the  tree.  Then,  there  exist  a  k  and  a  sequence  ki  such 
that  dj^ki  =  ^ja,k  n,=j„+i  “),*(•  Furthermore,  since 
E[dl  k]  =  E[dlo]E[a^f<'  and  d(l,2)  =  2J -  2ja,  we 
obtain 

Ty^{ki,k2)  = 

This  matrix  as  a  special  structure  inherited  from 
the  tree.  This  structure  is  depicted  in  figure  (2).  It  is 
constant  by  block,  the  value  in  each  block  depending 
on  distance  d(l,  2). 

Note  that  for  a  fixed  ki,  the  values  taken  by  the 
correlation  may  decrease  exponentially  fast  with  the 
number  of  the  block.  But  since  the  sizes  of  the  blocks 
grow  exponentially  fast,  it  is  the  possible  to  get  a  non 
summable  multicorrelation. 

Indeed,  let  us  study  rj’’^(0,/).  This  multicorrelation 
may  also  be  written  as 

r:}'"(0,/)  =  E[dl^o]{E[a^l^ 
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fe  =  l 

where  xi  is  the  characteristic  function  of  interval  I. 
Since  T]'’'iO,l)  is  constant  by  block,  it  is  easy  evaluate 

Sj  =  rj’"(0,/)  which  reads 


The  leading  term  q  =  2E[a’^]^/E[a^^]  in  the  geometric 
series  is  always  lower  or  equal  to  2.  There  are  therefore 
4  principal  cases  for  the  convergence  of  Sj  : 

•  E[a^”]  >  1  :  limj^+oo-Sj  =  oo,Vg. 

•  E[o^"]  <  1  and  q  <  I  '■  limj_).+co  Sj  =0. 

•  £■[0^”]  <  1  and  1  <  g  <  2  :  the  term  in  the 
brackets  goes  like  (2£'[a”]^/£'[a^"])‘^,  and  thus 
Sj  «  2*^£'[a"]^‘^.  Then,  convergence  occurs  if 
£[a"]  <  l/\/2,  and  we  get  divergence  otherwise. 

This  allows  the  construction  of  a  2nd-order  white 
process  which  is  long  memory  at  order  4  (for  exam¬ 
ple).  To  get  an  example  of  such  a  process,  suppose 
that  the  {oj.fe}  are  uniformly  distributed  over  [— Q,a]. 
Odd  moments  are  thus  equal  to  zero,  and  in  partic¬ 
ular,  we  obtain  that  the  dj.fc  are  uncorrelated  :  they 
constitute  a  2nd  order  white  process. 

We  now  find  a  such  that  second  and  fourth-order 
moments  exist,  but  in  order  that  Sj  is  not  summable. 
According  to  the  previous  conditions,  we  must  have 
E[a^]  <  1,  g  =  2E[a^]^/E[a^]  G  [1,2]  and  E[a^]  > 
l/\/2.  We  furthermore  impose  E[a^]  <  1  to  insure 
E[dj  i^]  <  oo. 

Even  orders  moments  of  the  {uj,*}  are  given  by 
E[a^"]  =  a^”/(2n  +  1)  and  then  g  =  10/9  G  [1,2]. 
E[a^]  <  1  is  verified  if  a  <  -YE  and  l/y/2  <  E[a^]  <  1 

if  a  G  [^JiV2/2,V^].  All  these  conditions  are  satis¬ 
fied  if  a  G  [^3x^/2,  v^]  leading  to  a  4th-order  long 
memory  sequence! 

We  now  turn  to  the  study  of  the  process  recon¬ 
structed  from  the  details  via  an  orthogonal  wavelet  ex¬ 
pansion  . 

3.2  Reconstructed  process  :  the  Haar  ba¬ 
sis  example 

To  get  a  reconstructed  time  series  from  the  details 
studied  previously,  we  choose  the  Haar  basis  for  sim¬ 
plicity.  Thus,  the  reconstructed  series  may  be  written, 


Figure  2.  Structure  of  the  correlation  matrix  for 
the  multiplicative  process.  Numbers  correspond  to 
the  distance  on  the  tree  between  two  nodes. 


using  J  scales 

J  2^  —  1 

Xn  =  >lX[0,2J-l](«)  +  EE  -  k) 

j=0  k=0 

where  X[o,2-^-i](’^)  represents  the  scaling  function  asso¬ 
ciated  to  the  Haar  basis,  and  where  h{n)  is  the  Haar 
wavelet. 

At  depth  J,  we  want  to  study  the  correlation  be¬ 
tween  X  at  time  a  and  x  at  time  (3.  Then  (J,  oj)  and 
(J,/?)  are  two  nodes  of  the  dyadic  tree.  Let  c  be  the 
depth  of  their  common  ancestor  node,  and  let  Z  be  the 
value  of  the  reconstructed  process  at  depth  c  on  this 
node.  We  can  then  write 

z = doo{i + ^  ( ri  p 

jzzl  \i  =  l  ) 

where  ol^  ~  The  signs  i  appears  to  be  the 

coefficients  of  the  haar  wavelets  h.  The  -h  sign  corre¬ 
spond  to  a  branch  going  to  the  left  and  the  —  sign  to  a 
branch  going  to  the  right.  Note  that  we  have  skipped 
the  indice  k  in  the  notation  of  the  variables  aj^k  since 
it  has  no  effect  on  the  result.  Now,  x(a)  and  x(/?)  are 
functions  of  Z  via 

x{a)  =  Z{1+  "J  (  n 

j=c+l  \i=c+l  / 

x{l3)  =  Z{1+  nJli 

k=c+l  V=c+1  / 

where  the  bi  correspond  to  the  aj^k  branch  link¬ 

ing  Z  and  x{f3). 

Suppose  now  that  the  aj^k  have  zero  mean.  Then  it 
is  easy  to  see  that  E(x(a)x(l3))  =  E{Z^).  The  evalua¬ 
tion  of  E{Z^)  is  also  simple  since  the  aj  are  indepen- 
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dent  variables.  We  thus  obtain 

E(Z^)  =  E{ay 

i=i 

1  -  (20-^)"+^ 

•  1  —  2cr2 

To  get  the  correlation  coeflScient  between  x[a)  and 
x(/?),  we  normalize  last  result  by  the  according  stan¬ 
dard  deviations.  Finally,  if  the  {aj^k}  are  centered,  we 
get 


1  _  /2^2\c-M 

Corr[x{a),xm=yJ^^-J^ 

Suppose  that  |a  -  /?|  =  r  >>  1.  In  that  case,  it 
is  easy  to  see  that  the  depth  of  the  common  ancestor 
goes  like  ck  J  -  log2(r).  Hence,  if  2<t^  >  1  we  get 

Corr[x{a),xm^ 

Since  G  [0.5, 1],  we  have  1  -f  log2(cr2)  G  [0, 1]  show¬ 
ing  that  the  reconstructed  process  is  2nd-order  long 
memory!  Furthermore,  this  process  is  nonGaussian  by 
construction  and  can  be  used  to  model  velocity  mea¬ 
surements  in  turbulent  fluids. 


4  Gaussianity,  Fourier  transform  and 
long  range  dependence 


The  spectral  domain  obtained  by  Fourier  transform 
is  of  particular  interest  in  a  large  number  of  applica¬ 
tions.  There  is  a  “paradigm’’  stating  that  “all”  the 
signals  in  the  spectral  domain  are  gaussian.  In  this  is¬ 
sue,  HOS-based  methods  are  not  usefuil  in  the  spectral 
domain. 

Let  us  recall  that 

If  x{n)  is  a  discrete  time  strictly  stationary  signal 
with  summable  multicorrelations  (mixing  conditions), 
then  the  discrete  Fourier  transform 


Xjsf{m)  = 


1 

y/N 


N-l 


n=0 


2'Kjnm(N 


(4) 


is  asymptotically  a  complex  Gaussian  variable. 

Therefore,  long  range  dependent  processes  will  not 
lead  to  gaussian  variables  in  the  frequency  domain. 

Let  us  now  examine  the  following  “paradox”  :  let 
Xr^(m)  be  the  complex  signal  defined  by  (4).  Then 
the  signal  obtained  as  N  goes  to  infinity  is  a  Gaussian 
signal.  Its  Fourier  transform  is  therefore  a  Gaussian 
signal!  :  all  signals  are  Gaussian!!! 


This  silly  thing  is  obviously  false,  because  one  of  the 
hypothesis  in  the  previous  theorem  is  violated.  To  see 
that,  assume  that  x{n)  is  complex  valued,  nonGaus¬ 
sian,  i.i.d.,  strictly  stationary  and  strictly  circular  (all 
multicorrelations  involving  a  different  number  of  com¬ 
plex  conjugate  than  non  complex  conjugate  are  zeros). 
Let  us  evaluate  for  example 

Cum[XAr(mi),X;v(m2),X;V(m3),Xj^(m4)]  - 

-f  m2  -  m3  -  m4) 

which  is  invariant  under  mi  — rui  -f  n.  Indeed,  it 
can  be  written  as  Cum[XN{rn),Xi^{rn-\-rni),X'^{rn-\- 
‘m2),X'^{m  -f  m3)]  =  ^J(mi  -  m2  -  m3).  But  this 
multicorrelation  is  not  asymptotically  summable!  since 
its  sum  is  {N  -h  l)|7a;^4|/2. 

Therefore,  there  is  no  paradox  since  the  multicorre¬ 
lations  of  X]sr{m)  are  not  summable.  Xjv(m)  is  there¬ 
fore  a  strange  long  memory  process:  its  multicorre¬ 
lations  are  constant  over  some  hyperplanes.  ‘  Further¬ 
more,  variables  A'iv(mi),  i  =  0, . . . ,  A/"  -  1  are  uncor¬ 
related,  and  we  find  another  example  of  second-order 
short  but  fourth-order  long  memory  process. 

5  To  Conclude 

The  aim  of  that  paper  was  to  study  long  range  de¬ 
pendent  phenomena  with  the  help  of  HOS.  We  have 
shown  some  problems  and  we  hope  that  this  will  open 
new  works.' 

Interesting  things  may  appear  when  dealing  with 
HOS  and  long  range  memory  processes.  We  have 
shown  that  long  memory  can  be  hidden  at  the  second 
order  and  revealed  at  higher  orders  .  This  fact  appears 
in  the  Fourier  transform. 

We  believe  that  nonGaussian  processes  that  present 
long  range  memory  need  to  be  studied  since  they  ap¬ 
pear  in  many  fields  where  nonlinearity  occurs.  Further, 
the  modelling  of  these  process  involves  new  very  inter¬ 
esting  emerging  techniques. . . 
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Abstract 

Self-similar  processes  have  recently  received  increas¬ 
ing  attention  in  the  signal  processing  community,  due  to 
their  wide  applicability  in  modeling  natural  phenomena 
which  exhibit  "Ilf"  spectra  and/or  long-range  depen¬ 
dence.  On  the  other  hand,  the  wavelet  decomposition  be¬ 
came  a  very  useful  tool  in  describing  nonstationary  self¬ 
similar  processes.  In  this  paper,  we  first  investigate  the  ex¬ 
istence  and  the  properties  of  higher-order  statistics  of  self - 
similar  processes  with  finite  variance.  Then,  we  consider 
some  self-similar  processes  with  infinite  variance  and  study 
the  statistical  properties  of  their  wavelet  coefficients. 


1.  Introduction 

Self-similar  stochastic  processes  are  of  great  interest  in 
the  modeling  of  many  natural  phenomena  which  appear  in 
communications,  geophysics,  hydrology,  turbulence  or  eco¬ 
nomics.  They  are  closely  connected  to  the  so-called  “1/  / 
noises”  and  signals  which  exhibit  long-range  dependence. 

They  are  defined  by  their  invariance  in  distribution  under 
time  or  space  scaling.  More  precisely,  a  stochastic  process 
{X(t),  <  G  M}  is  called  self-similar  with  index  H  (or  H- 
self-similar)  if,  for  any  a  >  0,  the  finite  dimensional  proba¬ 
bility  distributions  of  {X{at),  f  G  M}  are  identical  to  those 
of  {a^X{t),  f  G  K).  This  self-similar  stochastic  structure 
within  the  data  makes  them  ideal  candidates  for  a  multiscale 
analysis,  such  as  wavelet  analysis.  This  fact  was  already 
pointed  out  in  the  Gaussian  case,  when  the  considered  model 
is  the  fractional  Brownian  motion  (fBm).  As  shown  in  [4], 
[10],  the  variances  of  the  wavelet  coefficients  of  the  fBm  in¬ 
crease  exponentially  with  respect  to  the  scale  of  analysis. 
This  characteristic  leads  to  simple  methods  for  the  estima¬ 
tion  of  the  parameter  H  of  the  fBm. 

Most  of  the  aforementioned  works  were  concerned  with 
the  Gaussian  case.  However,  non-Gaussian  stochastic 


processes  are  often  encountered  in  many  practical  situations, 
in  particular  when  the  signal  exhibits  an  impulsive  behav¬ 
iour.  Here,  we  are  interested  in  self-similar  processes  with 
non-Gaussian  distributions  and  the  higher/lower-order  prop¬ 
erties  of  their  wavelet  coefficients. 

The  paper  is  organized  as  follows:  in  Section  2  we  derive 
a  condition  for  the  existence  of  the  cumulants  of  the  wavelet 
coefficients  of  a  nonstationary  process  and  give  some  prop¬ 
erties  of  the  cumulants  of  the  wavelet  coefficients  of  self¬ 
similar  processes.  Section  3  is  devoted  to  lower-order  prop¬ 
erties  of  a-stable  self-similar  processes.  Section  4  con¬ 
cludes  the  paper. 

2.  Higher-Order  Statistics 
2.1.  Background 

Throughout  this  paper,  we  will  consider  zero-mean 
processes .  Second-order  stationarity  is  defined  by  the  invari¬ 
ance  of  their  correlation  function  under  translation  in  time: 

V(fi,f2)  e  E{X(ti)X(t2)}  =  E{X(0)A(t2  -  h)}  ■ 

This  property  provides  a  full  characterization  of  station¬ 
ary  zero-mean  Gaussian  processes.  Here,  we  suppose  that 
higher-order  statistics  exist  up  to  the  order  n.  >  2.  In  this 
case,  it  is  useful  to  define  a  (wide-sense)  n-th  order  station¬ 
arity,  which  means  that  the  cumulants  of  order  n  are  func¬ 
tions  of  (n  -  1)  variables: 

VpG{2,...,n},V(fi,...,fp)GK^ 
cum{X(ti), .  ■ .  ,X(fp)} 

=  cum{X(0),X(t2  —  fi))  •  •  •  ■ 

Note  that  the  higher  order  statistics  of  a  random  process 
X{t)  exist  up  to  the  order  n  when 

VpG  {2, ...  ,n},V(fi,...  ,ip) 

cum{X(ti),...  ,X(fp)}  <  oo. 
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It  can  be  proved  that  a  sufficient  condition  for  this  property 
to  hold  is  that 


V/GM 

cum'P’  {|X(<)|}  =  cum{|X(t)|, . ,  |A''(<)|}  <  oo 


which  is  also  equivalent  to 


(1) 


A  sufficient  condition  for  this  property  to  be  satisfied  is: 

Vpe{2,...,n},  v(fei,...,fep)ezp, 
E{|cj(fci)|...|cj(Ap)|}  <  oo 

and  after  some  calculations  we  find  that  this  condition  is  sat¬ 
isfied  provided  that 


Vi  GM,E{iX(0r}  <  oo.  (2) 

A  class  of  processes  including  many  useful  self-similar 
processes  is  that  of  non  stationary  processes  with  station¬ 
ary  (possibly  fractional)  increments.  Fractional  increments 
where  first  considered  by  Hosking  [3],  in  a  discrete  frame¬ 
work  (fractional  ARIMA  processes).  The  fractional  differ¬ 
ence  operator  is  defined  by: 

Er=o(-i)M?)9r^'  if  Demx 

^  ^  \  1  if  D  =  0 

where 

fD\  _  D{D  -  1) , ,  ,{D  -  k  -\-  1) 

\k)~  k\ 

and  the  symbol  q~^  denotes  the  time-shift  operator: 
q~^F{t)  =  F{i  —  r).  Consequently,  we  will  define 

the  increments  of  order  D  (not  necessarily  integer)  of  a 
continuous-time  (possibly  nonstationary)  process  X{t)  by: 

A^X«;r)= 

We  will  say  that  a  nonstationary  process  has  n-th  order  sta¬ 
tionary  increments  of  order  D  if  the  increments  of  order  D, 
A^X(t;  t)  and  A^X(^;  r')  exist  and  are  mutually  n-th  or¬ 
der  stationary  processes  for  all  (r,  r'). 


Vp  G  {2, . . .  ,  n}  ,  VAr  G  Vt  G  M, 


/oo  1 


dt  <  oo. 


By  applying  the  Holder  inequality,  this  leads  to  the  follow¬ 
ing  sufficient  condition  for  the  existence  of  the  cumulants  of 
the  wavelet  coefficients  up  to  the  n-th  order: 


yk  G  G  ■ 


dt  <  OO 


(3) 


A  similar  condition  for  the  second  order  case  was  given  in 
[8].  Subsequently,  it  will  be  assumed  that  this  condition  is 
verified. 

For  self-similar  processes,  we  have  shown  that  the  cumu¬ 
lants  of  the  wavelet  coefficients,  when  they  exist,  increase 
exponentially  with  respect  to  the  scale  of  the  analysis.  More 
precisely. 

Proposition  1.  If  {cj(k),  k  G  Z}  are  the  wavelet  coeffi¬ 
cients  of  an  H -self-similar  process  at  resolution  2~f  then 
their  cumulants  are  such  that 


cum  {cj{ki), ,  Cj{kn)}  = 

X  cum{co(fei),...  ,Co(fcn)}- 


2.2.  Wavelet  Decomposition  Properties 

Firstly,  we  investigate  under  which  conditions  the  exis¬ 
tence  of  the  cumulants  of  the  wavelet  coefficients  of  a  non¬ 
stationary  process  X(t)  satisfying  (2)  is  guaranteed.  Let  the 
coefficients  of  the  wavelet  decomposition  of  the  process  at 
the  j  resolution  level  be  denoted  by: 

Cjik)  =  ^  (|-  -  e 

where  ^(t),  the  “mother  wavelet”  [7],  is  assumed  to  be  real. 

We  want  that  all  cumulants  of  the  wavelet  coefficients  at 
the  j  level  exist  up  to  the  n-th  order,  i.e.: 

VpG{2,...,n},  V(A;i,...,Ar^)G^^ 
cum  {cj(ki)y . . .  ,  Cj{kp)}  <  oo 
VpG{2,...,n},  V(A^i,...,^>)gZ^ 

E{cj(^i) . .  ,Cj{kp)}  <  oo. 


Proof  This  is  easy  to  prove,  by  writing  that: 
cum{cj(Ari),...  ,Cj{kn)}  = 
jT  ^  cum  {A(<i), . . .  ,  A(i„)} 

X  ^  *1  ...dt„ 

By  making  a  coordinate  change,  we  have: 
cum{cj{ki),...  ,Cj(kn)}  = 

2^  /  cum  {Xi2H[), ...,  X(2^<;)}  -  fei) . . . 

X  -  kn)dt[  ...dt^ 

Now,  the  self-similarity  of  the  process  combined  with  the 
multilinearity  of  the  cumulants  leads  to  the  desired  re¬ 
sult.  □ 
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Note  that  this  behaviour  is  exactly  the  opposite  of  that  en¬ 
countered  for  stationary  processes,  where  the  cumulants  de¬ 
cay  exponentially  when  j  — ^  oo. 

Self-similar  processes  with  finite  variance  and  station¬ 
ary  increments  form  a  large  class  of  processes,  including 
the  well-known  fractional  Brownian  motion  (fBm),  which 
is  the  only  Gaussian  process  in  this  class  and  which  is  a 
process  with  (possibly  fractional)  stationary  increments  of 
order  D.'iD  >  H  [9].  All  these  processes  have  the  same 
second  order  statistics  : 

E{A'(<i)A'(<2)}  =  Y  +  1^21^"  -  1^1  -  <2 1^^)  , 

identical  to  those  of  the  fBm.  They  thus  present  long- 
range  dependence,  which  makes  them  interesting  in  model¬ 
ing  some  natural  phenomena. 

The  non-Gaussian  processes  of  this  class,  subordonated 
to  Brownian  motion  B{t),  have  the  following  integral  rep¬ 
resentation: 

Am(<)  =  /  I  / 

Uo 

...  ,  s  —  ^m)  .  .  .  dB{^rn)i 

where  B(t)  is  an  ordinar  Brownian  motion  and 

An  interesting  process  within  this  class  is  the  Hermite 
process  defined  by: 

. {„.)  =  n?.,  0  <  o  <  i/m 

=:  if  >  0  and  0  otherwise) 

which  is  self-similar  with  index  H  —  I  —  \‘mD  €  (1/2, 1). 
It  can  be  shown  that  all  its  higher  order  moments  exist. 

Now.  we  will  focus  on  nonstationary  processes  with  frac¬ 
tional  stationary  increments,  as  described  in  Section  2.1. 
Generalizing  some  works  on  the  second-order  case  [8,  6], 
we  have  shown  the  following  result: 

Proposition  2.  If  a  process  X{t)  has  n-th  order  stationary 
increments  of  order  D  G  1R+  we  realize  its  wavelet 
analysis  with  r-vanishing  moments,  D  <  r,  the  wavelet  co¬ 
efficients  are  n-th  order  stationary. 


since  linear  transforms  preserve  the  distribution  of  any  linear 
combination  of  independent,  identically  distributed  o-stable 
random  variables.  Another  desirable  property  is  the  gener¬ 
alized  central  limit  theorem  [1 1].  These  processes  have  also 
turned  out  to  be  good  models  for  many  impulsive  signals  and 
noises,  when  the  great  variability  of  data  yields  probability 
distributions  with  “fat”  tails.  These  distributions  have  infi¬ 
nite  variance  and  undefined  higher-order  moments  but  it  was 
pointed  out  in  [1]  that  many  signal  processing  algorithms 
based  on  second-order  statistics  can  be  transposed  to  frac¬ 
tional  lower-order  moments. 

For  the  sake  of  simplicity,  we  will  only  consider  a-stable 
processes  with  a  ^  1.  However,  it  must  be  noted  that  our 
results  may  be  extended  to  the  case  a  =  1,  which  corre¬ 
sponds  to  the  Cauchy  distribution.  Let  (E,  £,  m)  be  a  mea¬ 
sure  space  and  A(t )  be  an  cv-stable  process  which  admits  the 
following  integral  representation 

X{t)=  I  fit,x)M{dx),t£R  (4) 

Je 

where  M  is  an  a-stable  random  measure  on  (S',  ^ )  with  con¬ 
trol  measure  m  and 

/  6  L°^iE,  S,  m)  =  {/  I  /  measurable  and 
J  \f{x)\'^m{dx)  <  ooj 

Remark  that  most  of  the  usual  a-stable  processes  admit  such 
a  representation.  If  M  is  a  symmetric  a-stable  (SaS)  mea¬ 
sure,  then  A  is  also  a  SaS  process. 

3.2.  Results 


If  A(t)  is  an  a-stable  process  given  by  (4),  we  first  note 
[2]  that  the  conditions 


<  oo  when  a  <  1 


<  oo  when  a  >  1 


dt 

(5) 

(6) 


are  equivalent  to 


3.  Lower-Order  Statistics 

3.1.  Background 

The  second  part  of  this  work  is  devoted  to  self-similar 
processes  with  non-Gaussian  a-stable  distributions  (0  < 
a  <  2).  The  family  of  stable  distributions  [2]  is  interesting, 


dt  <  oo  almost  surely. 


Consequently,  when  the  conditions  (5)-(6)  are  satisfied,  the 
existence  of  the  wavelet  coefficients  is  guaranteed.  In  the  se¬ 
quel,  we  will  assume  that  these  conditions  hold.  A  first  im¬ 
portant  result  is  then 
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Proposition  3.  Let  [t)  be  an  a -stable  ( resp.  SocS )  process 
satisfying  (4).  Its  wavelet  coefficients  also  form  an  a-stable 
(resp.  SaS)  process. 

Proof  As  (5)-(6)  have  been  assumed,  we  can  apply  a 
Fubini-type  theorem  [2]  which  yields 

and  the  function 

gj{k^x)  =  (8) 

is  in  L^(E,£,m).  Now,  by  using  the  properties  of  stable 
integrals,  we  conclude  that  {cj{k),  {j,k)  G  is  an  ca¬ 
stable  process.  □ 

Let  us  now  consider  an  cv-stable  process  with  an  integral  rep¬ 
resentation  (4)  and  a  skewness  function  P{x).  We  further  as¬ 
sume  that  the  control  measure  m  is  the  Lebesgue  measure 
on  =  E.  Sufficient  conditions  for  the  process  to  be  self¬ 
similar  with  index  H  are:  Vo;  G  M 

V/ G  E^,  Vfl  G  f{at,ax)  =  f{t,x)  (9) 

f3(x)  =  13.  (10) 

Under  these  conditions,  we  can  prove  the  following  result: 

Proposition  4.  If  a  self-similar  a-stable  process  satisfies 
(^)  (^O)y  then  its  wavelet  coefficients  form  a  self- 

similar process  in  the  sense  that 

Vj  €  z,  {cj  (A*),  e  z}  4  |2J("+5)  co(ik),  kex\. 

(11) 

Proof.  According  to  Proposition  3,  we  have 

Cj+i(k)=  /  gj+i{k,x)M{dx). 

Jm 

Furthermore,  by  using  (8)  and  (9),  we  find  that 

gj+i(k,  2x)  =  J  f{2t,  (Jr  -  k'^  dt 

=  2»+i-igj{k,x). 

By  expressing  the  characteristic  function  of  Cj^i{k),  it  can 
be  seen  that  this  relation  implies  that 

{c;+i(fc),  k€Z}i  {2("+i)  cj(k),  k  e  zj 

which  leads  to  (1 1).  □ 


Under  the  same  assumptions,  this  property  allows  us  to  show 
a  power-law  behaviour  of  the  fractional  lower-order  mo¬ 
ments  of  the  wavelet  coefficients  of  a  self-similar  process. 
Thus,  the  lower-order  moments  of  order  p,  0  <  p  <  a,  of 
the  wavelet  coefficients  are  V(j\  k)  G 

E{|c,(^)P}  =  2M^H)E{|co(fe)P}. 

A  similar  law  can  be  deduced  for  the  codifference  function 
of  two  wavelet  coefficients  at  the  same  scale:  V(j\  kj)  £ 

rc,(kj)  =  r,,(,),c,(0  = 

Now,  let  A(f)  be  an  a-stable  process  given  by  (4)  (where  m 
is  still  the  Lebesgue  measure).  Then,  a  sufficient  condition 
for  A(^)  to  have  stationary  increments  is  that  there  exists  a 
function  v^r(0  such  that  Va;  G  M, 

r  G  M,  f{t  +  r,  -  /(f,  x)  =  ipr{t  -  x) 

(12) 

and  (10)  holds. 

In  this  context,  we  have: 

Propositions.  The  wavelet  coefficients  of  an  a-stable 
process  satisfying  (4)y  (12)  and  (10)  form  stationary 
sequences  at  each  scale. 

Proof  As  L^{E,S,m)  is  a  vector  space,  the  increment 
process  AA(f;r)  =  A(f -fr)  — X(f)  isan  a-stable  process 
with 

/oo 

ipr{t  -  x)M{dx). 

*oo 

Notice  that  this  means  that  the  increment  process  is  a  sta¬ 
tionary  moving  average  process. 

Wealso  remark  that  if /(^,  x)  satisfies  (12),  then  itcan  be 
written  f{t,x)  =  fo{x)  fi{i  —  x).  Therefore,  by  using 
(8)  and  the  fact  that  ^(t)dt  =  0,  we  have: 

gj(k,x)  =  J ^f,(t-x)^^  (1--^)^^ 

=  9j  {0,x-k2^)  . 

It  can  be  shown  that  this  relation  entails  the  stationarity  of 
the  wavelet  coefficients.  □ 

Notice  that  Propositions  4  and  5  can  be  applied  to  several  a- 
stable  self-similar  processes  with  stationary  increments  such 
as  the  linear  fractional  stable  motion,  the  a-stable  Levy  mo¬ 
tion,  the  log-fractional  stable  motion  and  many  others. 

To  illustrate  these  theoretical  results,  we  have  simulated 
a  well-balanced  linear  fractional  SaS  motion  (see  Fig.  1). 
A  spline  wavelet  decomposition  of  this  process  was  real¬ 
ized  over  8  resolution  levels.  In  Fig.  2  the  lower-order 
moments  of  order  p  =  a/2  of  the  wavelet  coefficients 

E{\cjik)\-/^f^  ^  are  plotted  in  log-scales  as  a  function  of 
the  resolution  level  j. 
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4.  Conclusion 

In  this  paper  we  have  firstly  given  sufficient  conditions 
for  the  existence  of  the  cumulant  of  the  wavelet  coefficients 
and  then  deduced  a  power-law  behaviour  for  the  cumulants 
of  the  self-similar  processes  with  finite  variance.  Then,  we 
have  proved  similar  results  for  the  lower-order  moments  of 
self-similar  a-stable  processes. 
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Abstract 

Tugnait  has  used  the  cross  bispectrum  to  detect  non- 
Gaussian  signals  common  to  two  sensors  when  the  noise  in 
each  sensor  are  either  mutually  independent  or  have  van¬ 
ishing  bispectra.  However,  the  detection  methods  presented 
assume  enough  data  are  available  for  asymptotic  results  to 
apply.  If  this  assumption  is  not  valid  then  the  performance 
of  the  detection  methods  will  be  degraded.  In  this  paper,  we 
propose  a  detection  scheme  based  on  the  bootstrap  that  han¬ 
dles  the  small  data  size  case.  Unlike  other  bispectrum  based 
techniques,  the  proposed  scheme  maintains  the  nominal  test 
level  while  achieving  high  power.  Simulation  examples  are 
given  and  the  performance  of  the  bootstrap  based  method  is 
compared  with  a  method  proposed  by  Tugnait. 


1  Introduction 

The  cross  bispectrum  of  two  stationary,  zero-mean, 
discrete-time  random  processes,  x{t)  and  y{t),  is  defined 
as 

oo  oo 

E  E  E  [a:(f  +  i)x{t  +  A:)2/(i)] 

i=—oo  k=—oo 

X  exp{-j{ujii-{-u)2k)},  (1) 

where  E[  ]  denotes  the  expectation  operator.  It  is  well  known 
that  the  expression  in  ( 1 )  is  zero  when  x  {t)  and  y{t)  are  Gaus¬ 
sian  or  statistically  independent  random  processes.  This  re¬ 
sult  makes  the  cross  bispectrum  a  valuable  tool  for  analysing 
non-Gaussian  signals  common  to  x{t)  and  y{t). 

With  this  in  mind,  Tugnait  [8]  proposes  several  tests  for 
detecting  a  random  non-Gaussian  signal  in  noise.  Such  tests 
are  useful  for  verifying  the  validity  of  estimated  parameters 
such  as  the  differential  time  delay  from  two-sensor  data. 
They  are  essentially  cross  bispectral  versions  of  the  tests 
for  zero  bispectrum  proposed  by  Subba  Rao  and  Gabr  [7] 


and  Hinich  [3].  The  methods  are  all  based  on  asymptotic 
distributions. 

Recently,  work  has  been  done  to  remove  the  reliance  on 
asymptotic  results  for  bispectral  tests  by  incorporating  the 
bootstrap  [11,  12].  We  extend  the  idea  to  cross  bispectral 
tests. 

The  bootstrap  is  a  statistical  tool  proposed  by  Efron  [1] 
useful  for  distribution  estimation.  It  may  be  seen  as  the 
marriage  of  data  resampling  and  computer  simulation  to 
give  rise  to  techniques  that  are  able  to  handle  small  data 
sizes  (see,  for  example,  [9, 10]). 

The  advantage  of  a  detection  method  using  less  data  is 
at  least  twofold.  Firstly,  the  method  is  applicable  to  cases 
where  limited  data  is  available.  Secondly,  the  method  is 
applicable  to  nonstationary  data  if  we  can  assume  that  sta- 
tionarity  holds  for  a  short  segment  of  that  data. 

In  the  next  section,  we  specify  our  statistical  hypotheses 
and  test  statistic.  Section  3  explains  how  we  incorporate 
the  bootstrap  into  with  the  proposed  test.  Section  4  pro¬ 
vides  simulation  examples.  There,  the  proposed  method  is 
compared  with  an  existing  test.  We  draw  our  conclusions  in 
Section  5. 

2  Statistical  Test 

For  a  finite  length  of  data,  x{t),t  =  \,...,PxN,  divide 
x{t)  into  P  non-overlapping  segments  of  N  consecutive 
measurements,  (f),  i  =  1, . . . ,  P.  Similarly  for  y{t),  we 
havej/(*)(i),  i  = 

We  test  for  zero  cross  bispectrum  in  the  principal  domain 
of  the  discrete  version  of  Bxxyi^i ,  (^2), 

B{m,  n)  =  Bxxyi^inn/N,  2Trn/N),  (m,  n)  €  P, 

where  the  principal  domain  is  given  by 

V  =  {(m,n)  :  |n|  <  m  <  A^/2} . 

By  using  symmetric  and  periodic  properties  of  the  cross 
bispectrum,  all  other  points  outside  T>  may  be  obtained. 
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Thus,  we  specify  the  null  and  alternative  hypotheses  as 


Ho:B(m,n)  =  0, 

Hi:B(m,n)  ^  0, 


€  V. 


To  test  the  null  hypothesis  Ho,  we  choose  a  test  statistic  given 
by  the  sum  of  the  absolute  value  of  Student’s  t  statistics, 

B{m,n)  -  0 
a{m,n)  ’ 

where  B{m,n)  estimates  B{m,n)  and  a{m,nf  estimates 
var{ B{m,  n)}.  Multivariate  or  multiple  hypotheses  testing 
is  avoided  by  the  summation,  and  the  use  of  Student  s  t 
statistics  is  justified  by  [2].  We  discuss  a  bootstrap  estimator 
for  CT(ra,  n)^  in  the  next  section. 

For  B{m,  n),  we  use  the  following  estimator  [8, 5]: 


E 


B{m,n)  =  (2) 

The  cross  biperiodograms  I^^\Tn,n),  i  =  1, are 
given  by 

7(*)(m,n)  =  +  n),  (3) 

where  (m)  and  Y^^'>  (m)  are  the  DFTs  of  the  ith  segment 

ofx{t)  and  y{t)  respectively, 

N-1 

X(i){  m)  =  (t)  exp{“j27rmt/iV}, 

t=o 
jV-l 

E  j/W  (t)e\p{-j2'KmtlN), 

t=o 

m  =  0, . . .  —  1, 


and  y(*)(m)  is  the  complex  conjugate  of  Y^'^m).  Note 
that  P  and  N  must  be  chosen  such  that  N/P  0  as  the 
data  size  increases  to  ensure  consistency  of  the  estimator. 


3  Bootstrap  Method 

Table  1  gives  the  proposed  detection  scheme.  We  first 
estimate  the  cross  bispectrum  in  Step  1 . 

In  Step  2,  we  estimate  approximately  i.i.d.  (independent 
and  identically  distributed)  errors  under  the  asymptotic  re¬ 
sults  [8, 4], 

E{.B(m,n)}  =  B{m,n)  +  0(iV"^), 

yar{Re[I^^{m,n)]}  =  var{Im[/(*n”i>«)]} 

=  ^[1  +  -  2ll5^(m)gx(n)gy  (m  +  n)  +  0(1), 

2 

{m,n)  e  T)', 


Table  1.  Bootstrap  based  algorithm. 


1.  Calculate  (m,  n)  and  B{m,  n)  from  (3)  and  (2). 

2.  For  each  segment  and  (m,  n)  €  V,  form  residuals 

.  J«(m,n)-.B(m,n) 

e^’im.n)  = - ,  —  > 

where 

y (m,  n)  =  N[l  +  S{m  -  n)] 

(m  +  n). 

3.  Centre  the  residuals,  (m,  n)  =  (m,  n)  - 

where  represents  an  average  over  all  (m,  n). 

4.  Repeat  Bi  times: 

Form  bootstrap  cross  biperiodograms. 


jW*(m,n)  =  J3(m,n)  -be^®^*(m,n)y'KW{m,n), 

where  n)  is  randomly  drawn,  with 

replacement,  from  {Tn,n),'^{m,n)  £  V}. 

We  obtain  n),b  =  l,..-,Bi,  and  find 


1=1 

5.  Calculate  the  test  statistic, 


E 

(m,n)G‘P^ 


B{m,n) 


a{m,n) 


6.  Calculate  the  bootstrap  statistics, 

\Bl{m,n)  -  B{m,n) 


E 

(m,n)€r?' 


aUm,n) 


b=l,...,Bi,  where  (m,  n)  is  obtained  in  a 
similar  way  as  a{m,  n). 

7  RankTi',...,TB  in  increasing  order  to  obtain 

8.  Reject  the  null  hypothesis  if 

T  >  ^d(Bi+i)(i-a)J)’ 
where  a  is  the  nominal  level  of  the  test. 
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where  5a;  (m)  and  Sy{m)  are  the  discrete  power  spectra  of 
x{t)  and  y{t)  respectively.  The  region  V  is  V  excluding 
(m,  — m),  m  =  0, . . . ,  N/2  and  {N/2, 0),  the  bifrequencies 
where  the  cross  bispectrum  is  real.  We  first  approximate 
Sx{Tn)  and  5j,(m)  by  consistent  estimators  such 
as  (2)  and  the  smoothed  periodogram  [61.  By  subtracting 
B{m,n)  and  dividing  by  (with  y(')  (m,  n)  as 

given  in  Step  2),  we  subsequently  obtain  the  residuals. 

We  force  the  residuals  to  be  zero  mean  in  Step  3  and  ran¬ 
domly  sample  these  in  Step  4.  With  each  resulting  bootstrap 
residual,  we  get  a  new  cross  biperiodogram 

ordinate,  J(®)*(m,n)  by  adding  back  the  mean  and  vari¬ 
ance  information  removed  in  Step  2.  Applying  (2)  on  cross 
biperiodograms  formed  in  this  way,  Bi  bootstrap  cross  bis- 
pectrum  estimates  can  be  obtained.  Ideally,  Bi  is  chosen  as 
large  as  possible  to  approximate  the  sampling  distribution 
of  5(m,  n)  given  {e(*) (m, n)  e  V'}  as  the  under¬ 
lying  population  of  errors.  The  test  statistic  is  calculated  in 
Step  5. 

The  nested  bootstra]^  rnay  be  used  to  compute  0^(771  ^ 
which  estimates  var{.B(m,n)}.  Apply  Step  4  using  bJ 
instead  of  Bi  samples.  Then, 

a{m,n)^  =  (4) 

1  ^2  /  ^  1  ^2  \  ^ 

Bo  —  1  ^  ~  ^  I  • 

f-=l  \  "2  J 

The  standard  number  of  bootstrap  samples  for  estimating 
variances  is  ^2  =  25  [1]. 

Steps  6  and  7  involve  calculating  test  statistics  based  on 
the  bootstrap  cross  bispectrum  estimates  and  ranking  to  form 
an  empirical  population  under  Hq.  The  nested  bootstrap  may 
beusedtofindo'j(m,n)2,  b  =  1, . . . ,  Bi,in  thesame  way  as 
for  &{m,  n)2.  Apply  Step  4  using  B2  bootstrap  samples  and 
use  (4).  However,  sample  from  {4‘^*(m,n),V(m,n)  € 

*‘^ther  than  from  {£^*^m,n),vlm,n)  6  P'},  where 
iy  (m,  n)  is  (m,  n)  after  detrending  as  in  Step  3. 

The  hypothesis  test  is  Step  8,  using  the  empirical  pop¬ 
ulation  formed  in  the  preceding  steps.  We  conclude  that  a 
common  non-Gaussian  signal  is  present  if  T  is  larger  than 
100(1  —  a)%  of  the  bootstrap  test  statistics. 


4  Simulation  Examples 

Using  Bi  =  199  and  B2  =  25,  we  simulated  1000 
r^ords  of  an  MA(IO)  signal  common  to  both  sensors  in  six 
different  noise  scenarios  and  for  SNRs  -00  dB  (no  signal), 
0  dB,  10  dB,  20  dB  and  40  dB.  Each  record  was  512  data 
points  long.  We  ran  the  algorithm  in  Table  1  on  each  record 
which  was  divided  into  32  segments  of  16  data  points.  For 
the  smoothed  periodogram  estimates  of  spectra,  we  used  the 
Bartlett-Priestley  spectral  window  given  in  [6]  with  M  =  4. 


To  compare  the  performance  against  existing  tests,  we 
selected  the  GLRT-2  from  [8]  and  applied  it  over  the  same 
noise  scenarios  and  SNRs.  Preliminary  simulations  led  us 
to  the  optimum  settings;  4  segments  of  128  data  points,  and 
a  square  fine  grid  of  169  points  (r  =  6). 

We  present  here  only  a  subset  of  the  results  which  high¬ 
lights  the  ability  of  the  bootstrap  based  method  to  maintain 
the  nominal  level  for  short  records  of  data. 

4.1  Example  It  common  linear  non-Gaussian  sig¬ 
nal  in  common  Gaussian  noise 

This  example  is  based  on  Example  3  of  [8].  The  mea- 
surements  at  the  two  sensors  are  given  by 

x(t)  =  i45(f) ni  (^) 

y{t)  =  As{t  -  5) ni{t  -  10) 

where 

sit)  =  0.04e(f)  -  0.05e(f  -  1) -I- 0.07€(f  -  2) 

-0.2l€(f  -  3)  -  0.5e(f  -4)  +  0.72e(f  -  5) 
-l-0.36e(f  -  6)  -f  0.21e(f  -  8)  -)-  0.03e(f  -  9) 
+0.07e(f  -  10), 

and  e(f)  is  an  i.i.d.,  zero-mean,  exponential  random  process. 
The  common  noise  sequence  is  an  AR(  1 )  Gaussian  sequence 
given  by 

where  uj{t)  is  an  i.i.d.  standard  Gaussian  process.  The 
constant  A  is  chosen  to  achieve  a  desired  SNR. 

Results  are  given  in  Figure  1.  Both  methods,  the  pro- 
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Figure  2.  Example  2  detection  probability  ver¬ 
sus  nominal  test  level  for  SNRs  —  oo  dB,  0  dB 
and  10  dB. 


posed  bootstrap  based  test  and  the  GLRT-2,  maintain  the 
nominal  level  of  significance  (SNR=— oo  dB).  The  GLRT- 
2,  however,  has  noticeably  higher  probabilities  of  detection 
than  the  bootstrap  based  method. 

4.2  Example!:  common  linear  non-Gaussian  sig¬ 
nal  in  independent  non-Gaussian  noise 

This  example  is  based  on  Example  4  of  [8].  The  sensor 
measurements  are 

x{t)  =  ^is(f) +ni(f) 
y(t)  =  A2s{t  —  5)  +  n2{t) 

where  s{t)  is  as  in  Example  1.  The  noise  sequences  are 
independent  AR(1)  exponential  sequence  given  by 

nj(f)  =  —0.8ni{t)  +  *  =  1,2, 

where  Uiit)  and  U2it)  are  independent  of  each  other  and 
s{t),  and  are  i.i.d.,  zero-mean,  exponential  processes.  The 
constants  Ai  and  A2  are  chosen  to  achieve  a  desired  SNR 
at  each  sensor  respectively. 

Results  are  given  in  Figure  2.  The  GLRT-2  has  higher 
probability  of  detection  than  the  bootstrap  based  method 
as  in  Example  1.  However,  the  bootstrap  based  method 
continues  to  maintain  a  low  false  alarm  rate  whereas  the 
GLRT-2’s  false  alarm  rate  is  much  higher  than  the  nominal 
level. 

Additional  results  using  other  noise  structures  show  that 
the  bootstrap  based  method  maintain  the  nominal  level  while 
keeping  the  probability  of  detection  high.  The  GLRT-2  is 
unable  to  maintain  the  nominal  level  for  noises  based  on 


Table  2.  GLRT-2  results  for  512  data  points, 
SNR  =  10  dB,  nominal  level  at  5%. 


Noise  Used  %  False  Alarms  %  Detected 


Common 

i.i.d. 

5 

99 

Gaussian 

AR(1) 

6 

99 

noise 

AR(5) 

6 

99 

Independent 

i.i.d. 

31 

99 

exponential 

AR(1) 

30 

99 

noise 

AR(5) 

28 

99 

Table  3.  Bootstrap  results  for  512  data  points, 
SNR  =  10  dB,  nominal  level  at  5%. 


Noise  Used  %  False  Alarms  %  Detected 


Common 

i.i.d. 

4 

93 

Gaussian 

AR(1) 

1 

94 

noise 

AR(5) 

1 

93 

Independent 

i.i.d. 

8 

94 

exponential 

AR(1) 

1 

93 

noise 

AR(5) 

1 

95 

the  exponential  distribution.  This  can  be  seen  by  comparing 
results  in  Table  2  and  Table  3. 

The  AR(5)  noise  used  was 

mit)  =  0.5ni(i-l)-0.6ni(t-2)  +  0.3ni(t-3) 

-OAmit  -  4)  4-  0.2ni{t  -  5)  + 

wherea;i  (t)  is  i.i.d.  standard  Gaussian  for  common  Gaussian 
noises  and  i  =  1,2,  are  mutually  independent,  i.i.d., 
zero-mean  exponential  for  independent  exponential  noises. 

5  Conclusions 

We  have  presented  a  new  method  based  on  the  bootstrap 
for  detecting  a  non-Gaussian  signal  from  the  measurements 
of  two  sensors.  From  simulation  results,  we  have  shown  that 
the  method  is  able  to  maintain  a  low  false  alarm  rate  across 
six  noise  scenarios  for  short  data  records  (512  points).  The 
GLRT-2  was  found  to  have  a  high  probability  of  detection 
under  the  same  conditions  but  its  probability  of  false  alarm 
was  found  to  be  much  higher  than  the  nominal  level  for 
exponential  noises. 
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